Overview

Dataset statistics

Number of variables35
Number of observations1230000
Missing cells1443926
Missing cells (%)3.4%
Duplicate rows6763
Duplicate rows (%)0.5%
Total size in memory370.1 MiB
Average record size in memory315.5 B

Variable types

Categorical22
Numeric12
Boolean1

Alerts

FN has constant value "1.0" Constant
Active has constant value "1.0" Constant
Dataset has 6763 (0.5%) duplicate rowsDuplicates
customer_id has a high cardinality: 557593 distinct values High cardinality
prod_name has a high cardinality: 38202 distinct values High cardinality
product_type_name has a high cardinality: 127 distinct values High cardinality
department_name has a high cardinality: 249 distinct values High cardinality
section_name has a high cardinality: 56 distinct values High cardinality
detail_desc has a high cardinality: 36009 distinct values High cardinality
postal_code has a high cardinality: 254541 distinct values High cardinality
article_id is highly correlated with product_code and 1 other fieldsHigh correlation
price is highly correlated with section_nameHigh correlation
product_code is highly correlated with article_id and 1 other fieldsHigh correlation
product_type_no is highly correlated with product_group_name and 6 other fieldsHigh correlation
graphical_appearance_no is highly correlated with graphical_appearance_name and 3 other fieldsHigh correlation
colour_group_code is highly correlated with graphical_appearance_name and 3 other fieldsHigh correlation
perceived_colour_value_id is highly correlated with graphical_appearance_no and 5 other fieldsHigh correlation
perceived_colour_master_id is highly correlated with graphical_appearance_name and 6 other fieldsHigh correlation
department_no is highly correlated with product_type_no and 9 other fieldsHigh correlation
section_no is highly correlated with product_group_name and 8 other fieldsHigh correlation
garment_group_no is highly correlated with product_type_no and 10 other fieldsHigh correlation
age is highly correlated with FN and 1 other fieldsHigh correlation
sales_channel_id is highly correlated with FN and 1 other fieldsHigh correlation
sale is highly correlated with FN and 1 other fieldsHigh correlation
product_group_name is highly correlated with product_type_no and 9 other fieldsHigh correlation
graphical_appearance_name is highly correlated with product_group_name and 12 other fieldsHigh correlation
colour_group_name is highly correlated with product_group_name and 11 other fieldsHigh correlation
perceived_colour_value_name is highly correlated with graphical_appearance_no and 5 other fieldsHigh correlation
perceived_colour_master_name is highly correlated with graphical_appearance_name and 6 other fieldsHigh correlation
index_code is highly correlated with product_type_no and 11 other fieldsHigh correlation
index_name is highly correlated with product_type_no and 11 other fieldsHigh correlation
index_group_no is highly correlated with department_no and 7 other fieldsHigh correlation
index_group_name is highly correlated with department_no and 7 other fieldsHigh correlation
section_name is highly correlated with article_id and 16 other fieldsHigh correlation
garment_group_name is highly correlated with product_type_no and 11 other fieldsHigh correlation
FN is highly correlated with perceived_colour_master_name and 15 other fieldsHigh correlation
Active is highly correlated with perceived_colour_master_name and 15 other fieldsHigh correlation
club_member_status is highly correlated with FN and 1 other fieldsHigh correlation
fashion_news_frequency is highly correlated with FN and 1 other fieldsHigh correlation
FN has 708745 (57.6%) missing values Missing
Active has 716695 (58.3%) missing values Missing
graphical_appearance_no is highly skewed (γ1 = -60.75114289) Skewed

Reproduction

Analysis started2022-11-10 15:56:10.585625
Analysis finished2022-11-10 16:02:44.826929
Duration6 minutes and 34.24 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

customer_id
Categorical

HIGH CARDINALITY

Distinct557593
Distinct (%)45.3%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
b4db5e5259234574edfff958e170fe3a5e13b6f146752ca066abca3c156acc71
 
59
be1981ab818cf4ef6765b2ecaea7a2cbf14ccd6e8a7ee985513d9e8e53c6d91b
 
59
49beaacac0c7801c2ce2d189efe525fe80b5d37e46ed05b50a4cd88e34d0748f
 
53
8df45859ccd71ef1e48e2ee9d1c65d5728c31c46ae957d659fa4e5c3af6cc076
 
52
a65f77281a528bf5c1e9f270141d601d116e1df33bf9df512f495ee06647a9cc
 
51
Other values (557588)
1229726 

Length

Max length64
Median length64
Mean length64
Min length64

Characters and Unicode

Total characters78720000
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique293373 ?
Unique (%)23.9%

Sample

1st rowf05a521a2649a53841d0c5c837efb1d48e2eff7a6f6e47f94f0e21665d7adaa3
2nd row58afa373cb889cda30831ba3ca728bbb4147d5c1f3d19060f003bf5713d7f4f5
3rd row317ea97640e31f706565f2b61f17652ac569f05c1abc47fdf9fb2c4b446ca343
4th row6559a47c9760bc36d3f7a7497306daa1ea9ce4a3a340a0abfe07325b76f4cd1e
5th row10292f992bbf7a999f8f2eee6c1b2de299ee1279e369223b73c8baf6d65fce21

Common Values

ValueCountFrequency (%)
b4db5e5259234574edfff958e170fe3a5e13b6f146752ca066abca3c156acc7159
 
< 0.1%
be1981ab818cf4ef6765b2ecaea7a2cbf14ccd6e8a7ee985513d9e8e53c6d91b59
 
< 0.1%
49beaacac0c7801c2ce2d189efe525fe80b5d37e46ed05b50a4cd88e34d0748f53
 
< 0.1%
8df45859ccd71ef1e48e2ee9d1c65d5728c31c46ae957d659fa4e5c3af6cc07652
 
< 0.1%
a65f77281a528bf5c1e9f270141d601d116e1df33bf9df512f495ee06647a9cc51
 
< 0.1%
cd04ec2726dd58a8c753e0d6423e57716fd9ebcf2f14ed6012e7e5bea016b4d646
 
< 0.1%
e6498c7514c61d3c24669f49753dc83fdff3ec1ba13902dd9184c959d8f0b24946
 
< 0.1%
c140410d72a41ee5e2e3ba3d7f5a860f337f1b5e41c27cf9bda5517c8774f8fa45
 
< 0.1%
e97c3a6c680cd3569df10f901a61fdffaf8f70300f6adf6e266b80c87d54245a45
 
< 0.1%
6cc121e5cc202d2bf344ffe795002bdbf87178054bcda2e57161f0ef810a4b5545
 
< 0.1%
Other values (557583)1229499
> 99.9%

Length

2022-11-10T16:02:44.968353image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b4db5e5259234574edfff958e170fe3a5e13b6f146752ca066abca3c156acc7159
 
< 0.1%
be1981ab818cf4ef6765b2ecaea7a2cbf14ccd6e8a7ee985513d9e8e53c6d91b59
 
< 0.1%
49beaacac0c7801c2ce2d189efe525fe80b5d37e46ed05b50a4cd88e34d0748f53
 
< 0.1%
8df45859ccd71ef1e48e2ee9d1c65d5728c31c46ae957d659fa4e5c3af6cc07652
 
< 0.1%
a65f77281a528bf5c1e9f270141d601d116e1df33bf9df512f495ee06647a9cc51
 
< 0.1%
cd04ec2726dd58a8c753e0d6423e57716fd9ebcf2f14ed6012e7e5bea016b4d646
 
< 0.1%
e6498c7514c61d3c24669f49753dc83fdff3ec1ba13902dd9184c959d8f0b24946
 
< 0.1%
c140410d72a41ee5e2e3ba3d7f5a860f337f1b5e41c27cf9bda5517c8774f8fa45
 
< 0.1%
e97c3a6c680cd3569df10f901a61fdffaf8f70300f6adf6e266b80c87d54245a45
 
< 0.1%
6cc121e5cc202d2bf344ffe795002bdbf87178054bcda2e57161f0ef810a4b5545
 
< 0.1%
Other values (557583)1229499
> 99.9%

Most occurring characters

ValueCountFrequency (%)
24926056
 
6.3%
94925571
 
6.3%
a4924966
 
6.3%
e4924738
 
6.3%
84924482
 
6.3%
44920700
 
6.3%
14920449
 
6.3%
f4920159
 
6.3%
64919620
 
6.2%
04919119
 
6.2%
Other values (6)29494140
37.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number49201514
62.5%
Lowercase Letter29518486
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
24926056
10.0%
94925571
10.0%
84924482
10.0%
44920700
10.0%
14920449
10.0%
64919620
10.0%
04919119
10.0%
54917939
10.0%
34914812
10.0%
74912766
10.0%
Lowercase Letter
ValueCountFrequency (%)
a4924966
16.7%
e4924738
16.7%
f4920159
16.7%
c4918898
16.7%
d4917989
16.7%
b4911736
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common49201514
62.5%
Latin29518486
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
24926056
10.0%
94925571
10.0%
84924482
10.0%
44920700
10.0%
14920449
10.0%
64919620
10.0%
04919119
10.0%
54917939
10.0%
34914812
10.0%
74912766
10.0%
Latin
ValueCountFrequency (%)
a4924966
16.7%
e4924738
16.7%
f4920159
16.7%
c4918898
16.7%
d4917989
16.7%
b4911736
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII78720000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
24926056
 
6.3%
94925571
 
6.3%
a4924966
 
6.3%
e4924738
 
6.3%
84924482
 
6.3%
44920700
 
6.3%
14920449
 
6.3%
f4920159
 
6.3%
64919620
 
6.2%
04919119
 
6.2%
Other values (6)29494140
37.5%

article_id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct82328
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean696401650.3
Minimum108775015
Maximum956217002
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.8 MiB
2022-11-10T16:02:45.113241image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum108775015
5-th percentile448515036
Q1632307011
median714384002
Q3787028001
95-th percentile870970001
Maximum956217002
Range847441987
Interquartile range (IQR)154720990

Descriptive statistics

Standard deviation133245806.1
Coefficient of variation (CV)0.1913347075
Kurtosis2.527405494
Mean696401650.3
Median Absolute Deviation (MAD)77138000
Skewness-1.245928275
Sum8.565740298 × 1014
Variance1.775444483 × 1016
MonotonicityNot monotonic
2022-11-10T16:02:45.250239image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7060160011919
 
0.2%
7060160021355
 
0.1%
3728600011216
 
0.1%
6107760021171
 
0.1%
4642970071012
 
0.1%
759871002975
 
0.1%
372860002956
 
0.1%
610776001866
 
0.1%
399223001847
 
0.1%
720125001820
 
0.1%
Other values (82318)1218863
99.1%
ValueCountFrequency (%)
108775015408
< 0.1%
108775044280
< 0.1%
10877505111
 
< 0.1%
11006500139
 
< 0.1%
11006500218
 
< 0.1%
11006501134
 
< 0.1%
111565001181
 
< 0.1%
1115650031
 
< 0.1%
111586001537
< 0.1%
111593001494
< 0.1%
ValueCountFrequency (%)
9562170021
 
< 0.1%
9537630011
 
< 0.1%
9534500012
 
< 0.1%
9495510026
< 0.1%
9495510018
< 0.1%
9491980013
 
< 0.1%
9481520021
 
< 0.1%
9481520011
 
< 0.1%
9479340015
< 0.1%
9475990011
 
< 0.1%

price
Real number (ℝ≥0)

HIGH CORRELATION

Distinct25984
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02778704007
Minimum0.0001355932203
Maximum0.506779661
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.8 MiB
2022-11-10T16:02:45.402300image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.0001355932203
5-th percentile0.007610169492
Q10.01537288136
median0.02540677966
Q30.03388135593
95-th percentile0.05930508475
Maximum0.506779661
Range0.5066440678
Interquartile range (IQR)0.01850847458

Descriptive statistics

Standard deviation0.01935101504
Coefficient of variation (CV)0.6964043306
Kurtosis27.30249346
Mean0.02778704007
Median Absolute Deviation (MAD)0.008474576271
Skewness3.219502551
Sum34178.05928
Variance0.0003744617829
MonotonicityNot monotonic
2022-11-10T16:02:45.571015image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.01693220339128992
 
10.5%
0.03388135593128662
 
10.5%
0.02540677966123167
 
10.0%
0.0135423728856874
 
4.6%
0.042355932256642
 
4.6%
0.0508305084756234
 
4.6%
0.0220169491549025
 
4.0%
0.0304915254246119
 
3.7%
0.00845762711940769
 
3.3%
0.0152372881426495
 
2.2%
Other values (25974)517021
42.0%
ValueCountFrequency (%)
0.00013559322031
 
< 0.1%
0.00023728813561
 
< 0.1%
0.00032203389832
 
< 0.1%
0.00035593220342
 
< 0.1%
0.00037288135591
 
< 0.1%
0.00038983050851
 
< 0.1%
0.000423728813617
< 0.1%
0.00044067796612
 
< 0.1%
0.00045762711862
 
< 0.1%
0.00049152542374
 
< 0.1%
ValueCountFrequency (%)
0.5067796614
 
< 0.1%
0.5012203391
 
< 0.1%
0.49199887011
 
< 0.1%
0.43469759141
 
< 0.1%
0.422033898329
< 0.1%
0.41779661021
 
< 0.1%
0.41479418891
 
< 0.1%
0.41339034411
 
< 0.1%
0.40932203391
 
< 0.1%
0.40888953831
 
< 0.1%

sales_channel_id
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
2
868239 
1
361761 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1230000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2868239
70.6%
1361761
29.4%

Length

2022-11-10T16:02:45.709232image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:45.848187image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
2868239
70.6%
1361761
29.4%

Most occurring characters

ValueCountFrequency (%)
2868239
70.6%
1361761
29.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1230000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2868239
70.6%
1361761
29.4%

Most occurring scripts

ValueCountFrequency (%)
Common1230000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2868239
70.6%
1361761
29.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1230000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2868239
70.6%
1361761
29.4%

sale
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size10.6 MiB
True
1200000 
False
 
30000
ValueCountFrequency (%)
True1200000
97.6%
False30000
 
2.4%
2022-11-10T16:02:45.958520image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

product_code
Real number (ℝ≥0)

HIGH CORRELATION

Distinct39034
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean696401.6437
Minimum108775
Maximum956217
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.8 MiB
2022-11-10T16:02:46.077041image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum108775
5-th percentile448515
Q1632307
median714384
Q3787028
95-th percentile870970
Maximum956217
Range847442
Interquartile range (IQR)154721

Descriptive statistics

Standard deviation133245.8094
Coefficient of variation (CV)0.1913347141
Kurtosis2.527405227
Mean696401.6437
Median Absolute Deviation (MAD)77138
Skewness-1.245928234
Sum8.565740218 × 1011
Variance1.775444572 × 1010
MonotonicityNot monotonic
2022-11-10T16:02:46.214499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7060167006
 
0.6%
5622456015
 
0.5%
6107765279
 
0.4%
5995804578
 
0.4%
7174903082
 
0.3%
6956322881
 
0.2%
3728602833
 
0.2%
6842092786
 
0.2%
7598712592
 
0.2%
6885372552
 
0.2%
Other values (39024)1190396
96.8%
ValueCountFrequency (%)
108775699
0.1%
11006591
 
< 0.1%
111565182
 
< 0.1%
111586537
< 0.1%
111593494
< 0.1%
111609130
 
< 0.1%
1144283
 
< 0.1%
1163796
 
< 0.1%
11845833
 
< 0.1%
120129252
 
< 0.1%
ValueCountFrequency (%)
9562171
 
< 0.1%
9537631
 
< 0.1%
9534502
 
< 0.1%
94955114
< 0.1%
9491983
 
< 0.1%
9481522
 
< 0.1%
9479345
 
< 0.1%
9475991
 
< 0.1%
9475099
< 0.1%
9470605
 
< 0.1%

prod_name
Categorical

HIGH CARDINALITY

Distinct38202
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Jade HW Skinny Denim TRS
 
6423
Luna skinny RW
 
5448
Timeless Midrise Brief
 
4578
Tilly (1)
 
4036
Cat Tee.
 
3082
Other values (38197)
1206433 

Length

Max length30
Median length22
Mean length15.41491301
Min length1

Characters and Unicode

Total characters18960343
Distinct characters90
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6057 ?
Unique (%)0.5%

Sample

1st rowHazelnut Push Melbourne
2nd rowRachel
3rd rowBonina loose tank
4th rowEdit fancy dress
5th rowLady Di

Common Values

ValueCountFrequency (%)
Jade HW Skinny Denim TRS6423
 
0.5%
Luna skinny RW5448
 
0.4%
Timeless Midrise Brief4578
 
0.4%
Tilly (1)4036
 
0.3%
Cat Tee.3082
 
0.3%
Shake it in Balconette2788
 
0.2%
Simple as That Triangle Top2786
 
0.2%
Tilda tank2592
 
0.2%
Simple as that Cheeky Tanga2552
 
0.2%
Despacito2550
 
0.2%
Other values (38192)1193165
97.0%

Length

2022-11-10T16:02:46.377605image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dress76799
 
2.3%
top68992
 
2.0%
hw54778
 
1.6%
143668
 
1.3%
tee43078
 
1.3%
skinny41934
 
1.2%
denim33463
 
1.0%
shorts31757
 
0.9%
trs31748
 
0.9%
push30582
 
0.9%
Other values (12375)2916588
86.5%

Most occurring characters

ValueCountFrequency (%)
2149171
 
11.3%
e1430730
 
7.5%
a1220693
 
6.4%
i1044605
 
5.5%
s909621
 
4.8%
r897181
 
4.7%
n851277
 
4.5%
o760560
 
4.0%
t743435
 
3.9%
l738327
 
3.9%
Other values (80)8214743
43.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter12116310
63.9%
Uppercase Letter4169298
 
22.0%
Space Separator2149171
 
11.3%
Decimal Number203192
 
1.1%
Other Punctuation92692
 
0.5%
Open Punctuation81481
 
0.4%
Close Punctuation81172
 
0.4%
Dash Punctuation59682
 
0.3%
Math Symbol5632
 
< 0.1%
Modifier Symbol1213
 
< 0.1%
Other values (2)500
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1430730
11.8%
a1220693
 
10.1%
i1044605
 
8.6%
s909621
 
7.5%
r897181
 
7.4%
n851277
 
7.0%
o760560
 
6.3%
t743435
 
6.1%
l738327
 
6.1%
p416862
 
3.4%
Other values (23)3103019
25.6%
Uppercase Letter
ValueCountFrequency (%)
S479587
 
11.5%
T356434
 
8.5%
R257187
 
6.2%
E251058
 
6.0%
L248650
 
6.0%
P248608
 
6.0%
B219129
 
5.3%
A216370
 
5.2%
C214556
 
5.1%
M182256
 
4.4%
Other values (22)1495463
35.9%
Decimal Number
ValueCountFrequency (%)
176151
37.5%
239838
19.6%
324080
 
11.9%
516070
 
7.9%
913242
 
6.5%
013046
 
6.4%
78833
 
4.3%
47555
 
3.7%
83000
 
1.5%
61377
 
0.7%
Other Punctuation
ValueCountFrequency (%)
.69853
75.4%
/14266
 
15.4%
&5447
 
5.9%
!1801
 
1.9%
:935
 
1.0%
'344
 
0.4%
?46
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2149171
100.0%
Open Punctuation
ValueCountFrequency (%)
(81481
100.0%
Close Punctuation
ValueCountFrequency (%)
)81172
100.0%
Dash Punctuation
ValueCountFrequency (%)
-59682
100.0%
Math Symbol
ValueCountFrequency (%)
+5632
100.0%
Modifier Symbol
ValueCountFrequency (%)
^1213
100.0%
Connector Punctuation
ValueCountFrequency (%)
_499
100.0%
Other Symbol
ValueCountFrequency (%)
©1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin16285608
85.9%
Common2674735
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1430730
 
8.8%
a1220693
 
7.5%
i1044605
 
6.4%
s909621
 
5.6%
r897181
 
5.5%
n851277
 
5.2%
o760560
 
4.7%
t743435
 
4.6%
l738327
 
4.5%
S479587
 
2.9%
Other values (55)7209592
44.3%
Common
ValueCountFrequency (%)
2149171
80.4%
(81481
 
3.0%
)81172
 
3.0%
176151
 
2.8%
.69853
 
2.6%
-59682
 
2.2%
239838
 
1.5%
324080
 
0.9%
516070
 
0.6%
/14266
 
0.5%
Other values (15)62971
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII18959435
> 99.9%
None908
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2149171
 
11.3%
e1430730
 
7.5%
a1220693
 
6.4%
i1044605
 
5.5%
s909621
 
4.8%
r897181
 
4.7%
n851277
 
4.5%
o760560
 
4.0%
t743435
 
3.9%
l738327
 
3.9%
Other values (66)8213835
43.3%
None
ValueCountFrequency (%)
ö213
23.5%
é209
23.0%
è138
15.2%
Ä99
10.9%
Ö54
 
5.9%
í52
 
5.7%
ë43
 
4.7%
å34
 
3.7%
É32
 
3.5%
ä20
 
2.2%
Other values (4)14
 
1.5%

product_type_no
Real number (ℝ)

HIGH CORRELATION

Distinct128
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean245.4353106
Minimum-1
Maximum762
Zeros0
Zeros (%)0.0%
Negative3597
Negative (%)0.3%
Memory size18.8 MiB
2022-11-10T16:02:46.539633image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile66
Q1253
median264
Q3273
95-th percentile306
Maximum762
Range763
Interquartile range (IQR)20

Descriptive statistics

Standard deviation68.5649378
Coefficient of variation (CV)0.2793605274
Kurtosis3.221356815
Mean245.4353106
Median Absolute Deviation (MAD)10
Skewness-1.843928393
Sum301885432
Variance4701.150696
MonotonicityNot monotonic
2022-11-10T16:02:46.698474image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
272161142
 
13.1%
265123984
 
10.1%
252107245
 
8.7%
25585819
 
7.0%
25460523
 
4.9%
25857742
 
4.7%
25353568
 
4.4%
30650835
 
4.1%
27444651
 
3.6%
29842698
 
3.5%
Other values (118)441793
35.9%
ValueCountFrequency (%)
-13597
 
0.3%
4937
 
< 0.1%
5710915
 
0.9%
5941813
3.4%
6025
 
< 0.1%
669148
 
0.7%
677821
 
0.6%
68512
 
< 0.1%
691355
 
0.1%
707659
 
0.6%
ValueCountFrequency (%)
7623
 
< 0.1%
7616
 
< 0.1%
532133
 
< 0.1%
52974
 
< 0.1%
5254
 
< 0.1%
52315
 
< 0.1%
52113
 
< 0.1%
51553
 
< 0.1%
5144
 
< 0.1%
5121725
0.1%

product_type_name
Categorical

HIGH CARDINALITY

Distinct127
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Trousers
161142 
Dress
123984 
Sweater
107245 
T-shirt
85819 
Top
 
60523
Other values (122)
691287 

Length

Max length24
Median length19
Mean length7.497685366
Min length3

Characters and Unicode

Total characters9222153
Distinct characters51
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowBra
2nd rowSweater
3rd rowVest top
4th rowDress
5th rowSweater

Common Values

ValueCountFrequency (%)
Trousers161142
 
13.1%
Dress123984
 
10.1%
Sweater107245
 
8.7%
T-shirt85819
 
7.0%
Top60523
 
4.9%
Blouse57742
 
4.7%
Vest top53568
 
4.4%
Bra50835
 
4.1%
Shorts44651
 
3.6%
Bikini top42698
 
3.5%
Other values (117)441793
35.9%

Length

2022-11-10T16:02:46.862514image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
trousers161470
 
11.0%
top157194
 
10.7%
dress123984
 
8.5%
sweater107245
 
7.3%
bottom86720
 
5.9%
t-shirt85819
 
5.9%
blouse57742
 
3.9%
vest53568
 
3.7%
underwear53077
 
3.6%
bra50886
 
3.5%
Other values (135)526067
35.9%

Most occurring characters

ValueCountFrequency (%)
r1046926
 
11.4%
s1011179
 
11.0%
e953284
 
10.3%
t782260
 
8.5%
o709446
 
7.7%
i516494
 
5.6%
a464583
 
5.0%
T344145
 
3.7%
S317223
 
3.4%
u273526
 
3.0%
Other values (41)2803087
30.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7575875
82.1%
Uppercase Letter1278362
 
13.9%
Space Separator233772
 
2.5%
Dash Punctuation85860
 
0.9%
Other Punctuation48284
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r1046926
13.8%
s1011179
13.3%
e953284
12.6%
t782260
10.3%
o709446
9.4%
i516494
 
6.8%
a464583
 
6.1%
u273526
 
3.6%
w261929
 
3.5%
h211075
 
2.8%
Other values (15)1345173
17.8%
Uppercase Letter
ValueCountFrequency (%)
T344145
26.9%
S317223
24.8%
B201445
15.8%
D125547
 
9.8%
U57232
 
4.5%
V53568
 
4.2%
H36138
 
2.8%
J32893
 
2.6%
L28350
 
2.2%
P25950
 
2.0%
Other values (12)55871
 
4.4%
Other Punctuation
ValueCountFrequency (%)
/48280
> 99.9%
.4
 
< 0.1%
Space Separator
ValueCountFrequency (%)
233772
100.0%
Dash Punctuation
ValueCountFrequency (%)
-85860
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8854237
96.0%
Common367916
 
4.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r1046926
11.8%
s1011179
11.4%
e953284
 
10.8%
t782260
 
8.8%
o709446
 
8.0%
i516494
 
5.8%
a464583
 
5.2%
T344145
 
3.9%
S317223
 
3.6%
u273526
 
3.1%
Other values (37)2435171
27.5%
Common
ValueCountFrequency (%)
233772
63.5%
-85860
 
23.3%
/48280
 
13.1%
.4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII9222153
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r1046926
 
11.4%
s1011179
 
11.0%
e953284
 
10.3%
t782260
 
8.5%
o709446
 
7.7%
i516494
 
5.6%
a464583
 
5.0%
T344145
 
3.7%
S317223
 
3.4%
u273526
 
3.0%
Other values (41)2803087
30.4%

product_group_name
Categorical

HIGH CORRELATION

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Garment Upper body
484557 
Garment Lower body
270415 
Garment Full body
136904 
Underwear
98277 
Swimwear
97859 
Other values (14)
141988 

Length

Max length21
Median length18
Mean length15.44760976
Min length3

Characters and Unicode

Total characters19000560
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnderwear
2nd rowGarment Upper body
3rd rowGarment Upper body
4th rowGarment Full body
5th rowGarment Upper body

Common Values

ValueCountFrequency (%)
Garment Upper body484557
39.4%
Garment Lower body270415
22.0%
Garment Full body136904
 
11.1%
Underwear98277
 
8.0%
Swimwear97859
 
8.0%
Accessories67039
 
5.5%
Shoes30091
 
2.4%
Socks & Tights26763
 
2.2%
Nightwear13812
 
1.1%
Unknown3597
 
0.3%
Other values (9)686
 
0.1%

Length

2022-11-10T16:02:47.005412image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
garment891887
29.1%
body891876
29.1%
upper484557
15.8%
lower270415
 
8.8%
full136904
 
4.5%
underwear98277
 
3.2%
swimwear97859
 
3.2%
accessories67039
 
2.2%
shoes30091
 
1.0%
socks26763
 
0.9%
Other values (16)71646
 
2.3%

Most occurring characters

ValueCountFrequency (%)
e2119752
 
11.2%
r2022312
 
10.6%
1837314
 
9.7%
o1289878
 
6.8%
a1102233
 
5.8%
n1001081
 
5.3%
d990206
 
5.2%
m990062
 
5.2%
p969114
 
5.1%
t932861
 
4.9%
Other values (25)5745747
30.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14987791
78.9%
Uppercase Letter2148650
 
11.3%
Space Separator1837314
 
9.7%
Other Punctuation26805
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2119752
14.1%
r2022312
13.5%
o1289878
8.6%
a1102233
7.4%
n1001081
 
6.7%
d990206
 
6.6%
m990062
 
6.6%
p969114
 
6.5%
t932861
 
6.2%
y891882
 
6.0%
Other values (11)2678410
17.9%
Uppercase Letter
ValueCountFrequency (%)
G891887
41.5%
U586473
27.3%
L270415
 
12.6%
S154730
 
7.2%
F136926
 
6.4%
A67039
 
3.1%
T26763
 
1.2%
N13812
 
0.6%
B286
 
< 0.1%
I242
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
&26763
99.8%
/42
 
0.2%
Space Separator
ValueCountFrequency (%)
1837314
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin17136441
90.2%
Common1864119
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2119752
12.4%
r2022312
11.8%
o1289878
 
7.5%
a1102233
 
6.4%
n1001081
 
5.8%
d990206
 
5.8%
m990062
 
5.8%
p969114
 
5.7%
t932861
 
5.4%
G891887
 
5.2%
Other values (22)4827055
28.2%
Common
ValueCountFrequency (%)
1837314
98.6%
&26763
 
1.4%
/42
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII19000560
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2119752
 
11.2%
r2022312
 
10.6%
1837314
 
9.7%
o1289878
 
6.8%
a1102233
 
5.8%
n1001081
 
5.3%
d990206
 
5.2%
m990062
 
5.2%
p969114
 
5.1%
t932861
 
4.9%
Other values (25)5745747
30.2%

graphical_appearance_no
Real number (ℝ)

HIGH CORRELATION
SKEWED

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1009740.487
Minimum-1
Maximum1010029
Zeros0
Zeros (%)0.0%
Negative333
Negative (%)< 0.1%
Memory size18.8 MiB
2022-11-10T16:02:47.149185image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile1010001
Q11010010
median1010016
Q31010016
95-th percentile1010023
Maximum1010029
Range1010030
Interquartile range (IQR)6

Descriptive statistics

Standard deviation16616.46993
Coefficient of variation (CV)0.01645617874
Kurtosis3688.707895
Mean1009740.487
Median Absolute Deviation (MAD)0
Skewness-60.75114289
Sum1.241980799 × 1012
Variance276107072.9
MonotonicityNot monotonic
2022-11-10T16:02:47.290331image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
1010016688101
55.9%
1010001154917
 
12.6%
101002375193
 
6.1%
101001073541
 
6.0%
101001756140
 
4.6%
101002628222
 
2.3%
101000422874
 
1.9%
101002122530
 
1.8%
101001417410
 
1.4%
101000813114
 
1.1%
Other values (20)77958
 
6.3%
ValueCountFrequency (%)
-1333
 
< 0.1%
1010001154917
12.6%
10100025721
 
0.5%
101000345
 
< 0.1%
101000422874
 
1.9%
10100059611
 
0.8%
10100069299
 
0.8%
101000712241
 
1.0%
101000813114
 
1.1%
10100097160
 
0.6%
ValueCountFrequency (%)
101002913
 
< 0.1%
10100281541
 
0.1%
10100271144
 
0.1%
101002628222
 
2.3%
1010025869
 
0.1%
10100241387
 
0.1%
101002375193
6.1%
10100226807
 
0.6%
101002122530
 
1.8%
10100206694
 
0.5%

graphical_appearance_name
Categorical

HIGH CORRELATION

Distinct30
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Solid
688101 
All over pattern
154917 
Denim
75193 
Melange
73541 
Stripe
 
56140
Other values (25)
182108 

Length

Max length19
Median length5
Mean length7.308760163
Min length3

Characters and Unicode

Total characters8989775
Distinct characters42
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSolid
2nd rowMelange
3rd rowSolid
4th rowAll over pattern
5th rowSolid

Common Values

ValueCountFrequency (%)
Solid688101
55.9%
All over pattern154917
 
12.6%
Denim75193
 
6.1%
Melange73541
 
6.0%
Stripe56140
 
4.6%
Other structure28222
 
2.3%
Check22874
 
1.9%
Lace22530
 
1.8%
Placement print17410
 
1.4%
Front print13114
 
1.1%
Other values (20)77958
 
6.3%

Length

2022-11-10T16:02:47.445488image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
solid688101
42.7%
pattern156618
 
9.7%
over154917
 
9.6%
all154917
 
9.6%
denim75193
 
4.7%
melange73541
 
4.6%
stripe56140
 
3.5%
print30524
 
1.9%
other29923
 
1.9%
structure28222
 
1.7%
Other values (25)164805
 
10.2%

Most occurring characters

ValueCountFrequency (%)
l1144227
12.7%
o922288
10.3%
i916076
10.2%
e778360
8.7%
S747945
8.3%
d713167
 
7.9%
t584594
 
6.5%
r562653
 
6.3%
n408194
 
4.5%
382901
 
4.3%
Other values (32)1829370
20.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter7342382
81.7%
Uppercase Letter1242881
 
13.8%
Space Separator382901
 
4.3%
Other Punctuation15890
 
0.2%
Decimal Number5721
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l1144227
15.6%
o922288
12.6%
i916076
12.5%
e778360
10.6%
d713167
9.7%
t584594
8.0%
r562653
7.7%
n408194
 
5.6%
a317340
 
4.3%
p259585
 
3.5%
Other values (13)735898
10.0%
Uppercase Letter
ValueCountFrequency (%)
S747945
60.2%
A160683
 
12.9%
D90213
 
7.3%
M87742
 
7.1%
C40566
 
3.3%
O29923
 
2.4%
L22530
 
1.8%
P17410
 
1.4%
F13114
 
1.1%
E12241
 
1.0%
Other values (6)20514
 
1.7%
Space Separator
ValueCountFrequency (%)
382901
100.0%
Other Punctuation
ValueCountFrequency (%)
/15890
100.0%
Decimal Number
ValueCountFrequency (%)
35721
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8585263
95.5%
Common404512
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
l1144227
13.3%
o922288
10.7%
i916076
10.7%
e778360
9.1%
S747945
8.7%
d713167
8.3%
t584594
 
6.8%
r562653
 
6.6%
n408194
 
4.8%
a317340
 
3.7%
Other values (29)1490419
17.4%
Common
ValueCountFrequency (%)
382901
94.7%
/15890
 
3.9%
35721
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII8989775
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l1144227
12.7%
o922288
10.3%
i916076
10.2%
e778360
8.7%
S747945
8.3%
d713167
 
7.9%
t584594
 
6.5%
r562653
 
6.3%
n408194
 
4.5%
382901
 
4.3%
Other values (32)1829370
20.3%

colour_group_code
Real number (ℝ)

HIGH CORRELATION

Distinct50
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.40332114
Minimum-1
Maximum93
Zeros0
Zeros (%)0.0%
Negative207
Negative (%)< 0.1%
Memory size18.8 MiB
2022-11-10T16:02:47.594413image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile7
Q19
median10
Q343
95-th percentile73
Maximum93
Range94
Interquartile range (IQR)34

Descriptive statistics

Standard deviation26.21933171
Coefficient of variation (CV)0.99303158
Kurtosis-0.07699996051
Mean26.40332114
Median Absolute Deviation (MAD)2
Skewness1.206646349
Sum32476085
Variance687.4533552
MonotonicityNot monotonic
2022-11-10T16:02:48.050354image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9421724
34.3%
10129229
 
10.5%
7384476
 
6.9%
1248520
 
3.9%
7241798
 
3.4%
1335672
 
2.9%
7135155
 
2.9%
5134866
 
2.8%
733660
 
2.7%
1132689
 
2.7%
Other values (40)332211
27.0%
ValueCountFrequency (%)
-1207
 
< 0.1%
1853
 
0.1%
2362
 
< 0.1%
34917
 
0.4%
4424
 
< 0.1%
510488
 
0.9%
615991
 
1.3%
733660
 
2.7%
827883
 
2.3%
9421724
34.3%
ValueCountFrequency (%)
9327365
 
2.2%
927297
 
0.6%
914807
 
0.4%
90618
 
0.1%
834588
 
0.4%
822888
 
0.2%
813086
 
0.3%
8026
 
< 0.1%
7384476
6.9%
7241798
3.4%

colour_group_name
Categorical

HIGH CORRELATION

Distinct50
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Black
421724 
White
129229 
Dark Blue
84476 
Light Beige
 
48520
Blue
 
41798
Other values (45)
504253 

Length

Max length15
Median length14
Mean length6.929918699
Min length3

Characters and Unicode

Total characters8523800
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLight Orange
2nd rowDark Grey
3rd rowWhite
4th rowBlack
5th rowDark Blue

Common Values

ValueCountFrequency (%)
Black421724
34.3%
White129229
 
10.5%
Dark Blue84476
 
6.9%
Light Beige48520
 
3.9%
Blue41798
 
3.4%
Beige35672
 
2.9%
Light Blue35155
 
2.9%
Light Pink34866
 
2.8%
Grey33660
 
2.7%
Off White32689
 
2.7%
Other values (40)332211
27.0%

Length

2022-11-10T16:02:48.196542image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
black421724
24.9%
dark207702
12.2%
light170468
10.0%
white161918
 
9.5%
blue161767
 
9.5%
beige97814
 
5.8%
grey77534
 
4.6%
pink64611
 
3.8%
red59630
 
3.5%
green40087
 
2.4%
Other values (16)233117
13.7%

Most occurring characters

ValueCountFrequency (%)
e910641
 
10.7%
k723650
 
8.5%
l699664
 
8.2%
B697896
 
8.2%
a692981
 
8.1%
i587631
 
6.9%
466372
 
5.5%
r439913
 
5.2%
c421724
 
4.9%
h418158
 
4.9%
Other values (28)2465170
28.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6360208
74.6%
Uppercase Letter1696796
 
19.9%
Space Separator466372
 
5.5%
Other Punctuation424
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e910641
14.3%
k723650
11.4%
l699664
11.0%
a692981
10.9%
i587631
9.2%
r439913
6.9%
c421724
6.6%
h418158
6.6%
t341205
 
5.4%
g301707
 
4.7%
Other values (12)822934
12.9%
Uppercase Letter
ValueCountFrequency (%)
B697896
41.1%
D207702
 
12.2%
L170468
 
10.0%
W161918
 
9.5%
G159851
 
9.4%
O74571
 
4.4%
P72333
 
4.3%
R59630
 
3.5%
Y46523
 
2.7%
K29406
 
1.7%
Other values (4)16498
 
1.0%
Space Separator
ValueCountFrequency (%)
466372
100.0%
Other Punctuation
ValueCountFrequency (%)
/424
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8057004
94.5%
Common466796
 
5.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e910641
11.3%
k723650
 
9.0%
l699664
 
8.7%
B697896
 
8.7%
a692981
 
8.6%
i587631
 
7.3%
r439913
 
5.5%
c421724
 
5.2%
h418158
 
5.2%
t341205
 
4.2%
Other values (26)2123541
26.4%
Common
ValueCountFrequency (%)
466372
99.9%
/424
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII8523800
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e910641
 
10.7%
k723650
 
8.5%
l699664
 
8.2%
B697896
 
8.2%
a692981
 
8.1%
i587631
 
6.9%
466372
 
5.5%
r439913
 
5.2%
c421724
 
4.9%
h418158
 
4.9%
Other values (28)2465170
28.9%

perceived_colour_value_id
Real number (ℝ)

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.262581301
Minimum-1
Maximum7
Zeros0
Zeros (%)0.0%
Negative207
Negative (%)< 0.1%
Memory size18.8 MiB
2022-11-10T16:02:48.311632image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile1
Q12
median4
Q34
95-th percentile5
Maximum7
Range8
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.421624465
Coefficient of variation (CV)0.4357361039
Kurtosis0.2053572724
Mean3.262581301
Median Absolute Deviation (MAD)1
Skewness0.07923424239
Sum4012975
Variance2.021016118
MonotonicityNot monotonic
2022-11-10T16:02:48.411701image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
4598176
48.6%
1211592
 
17.2%
3176942
 
14.4%
2145030
 
11.8%
548759
 
4.0%
748441
 
3.9%
6853
 
0.1%
-1207
 
< 0.1%
ValueCountFrequency (%)
-1207
 
< 0.1%
1211592
 
17.2%
2145030
 
11.8%
3176942
 
14.4%
4598176
48.6%
548759
 
4.0%
6853
 
0.1%
748441
 
3.9%
ValueCountFrequency (%)
748441
 
3.9%
6853
 
0.1%
548759
 
4.0%
4598176
48.6%
3176942
 
14.4%
2145030
 
11.8%
1211592
 
17.2%
-1207
 
< 0.1%

perceived_colour_value_name
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Dark
598176 
Dusty Light
211592 
Light
176942 
Medium Dusty
145030 
Bright
 
48759
Other values (3)
 
49501

Length

Max length12
Median length11
Mean length6.453343089
Min length4

Characters and Unicode

Total characters7937612
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDusty Light
2nd rowDark
3rd rowLight
4th rowDark
5th rowDark

Common Values

ValueCountFrequency (%)
Dark598176
48.6%
Dusty Light211592
 
17.2%
Light176942
 
14.4%
Medium Dusty145030
 
11.8%
Bright48759
 
4.0%
Medium48441
 
3.9%
Undefined853
 
0.1%
Unknown207
 
< 0.1%

Length

2022-11-10T16:02:48.552538image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:48.711562image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
dark598176
37.7%
light388534
24.5%
dusty356622
22.5%
medium193471
 
12.2%
bright48759
 
3.1%
undefined853
 
0.1%
unknown207
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
D954798
12.0%
t793915
10.0%
r646935
 
8.2%
i631617
 
8.0%
k598383
 
7.5%
a598176
 
7.5%
u550093
 
6.9%
h437293
 
5.5%
g437293
 
5.5%
L388534
 
4.9%
Other values (13)1900575
23.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5994368
75.5%
Uppercase Letter1586622
 
20.0%
Space Separator356622
 
4.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t793915
13.2%
r646935
10.8%
i631617
10.5%
k598383
10.0%
a598176
10.0%
u550093
9.2%
h437293
7.3%
g437293
7.3%
y356622
5.9%
s356622
5.9%
Other values (7)587419
9.8%
Uppercase Letter
ValueCountFrequency (%)
D954798
60.2%
L388534
24.5%
M193471
 
12.2%
B48759
 
3.1%
U1060
 
0.1%
Space Separator
ValueCountFrequency (%)
356622
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7580990
95.5%
Common356622
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
D954798
12.6%
t793915
10.5%
r646935
8.5%
i631617
8.3%
k598383
 
7.9%
a598176
 
7.9%
u550093
 
7.3%
h437293
 
5.8%
g437293
 
5.8%
L388534
 
5.1%
Other values (12)1543953
20.4%
Common
ValueCountFrequency (%)
356622
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII7937612
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D954798
12.0%
t793915
10.0%
r646935
 
8.2%
i631617
 
8.0%
k598383
 
7.5%
a598176
 
7.5%
u550093
 
6.9%
h437293
 
5.5%
g437293
 
5.5%
L388534
 
4.9%
Other values (13)1900575
23.9%

perceived_colour_master_id
Real number (ℝ)

HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.61538374
Minimum-1
Maximum20
Zeros0
Zeros (%)0.0%
Negative10434
Negative (%)0.8%
Memory size18.8 MiB
2022-11-10T16:02:48.852836image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile2
Q15
median5
Q311
95-th percentile19
Maximum20
Range21
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.061592998
Coefficient of variation (CV)0.6646537025
Kurtosis0.1165722332
Mean7.61538374
Median Absolute Deviation (MAD)3
Skewness0.9687374366
Sum9366922
Variance25.61972368
MonotonicityNot monotonic
2022-11-10T16:02:48.970365image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
5418691
34.0%
9162441
 
13.2%
2161946
 
13.2%
1179527
 
6.5%
1275152
 
6.1%
463831
 
5.2%
1860190
 
4.9%
1936371
 
3.0%
2034489
 
2.8%
829167
 
2.4%
Other values (10)108195
 
8.8%
ValueCountFrequency (%)
-110434
 
0.8%
112937
 
1.1%
2161946
 
13.2%
327095
 
2.2%
463831
 
5.2%
5418691
34.0%
67660
 
0.6%
79169
 
0.7%
829167
 
2.4%
9162441
 
13.2%
ValueCountFrequency (%)
2034489
2.8%
1936371
3.0%
1860190
4.9%
167
 
< 0.1%
1515829
 
1.3%
14853
 
0.1%
1324182
 
2.0%
1275152
6.1%
1179527
6.5%
1029
 
< 0.1%

perceived_colour_master_name
Categorical

HIGH CORRELATION

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Black
418691 
White
162441 
Blue
161946 
Beige
79527 
Grey
75152 
Other values (15)
332243 

Length

Max length15
Median length5
Mean length4.954361789
Min length3

Characters and Unicode

Total characters6093865
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOrange
2nd rowGrey
3rd rowWhite
4th rowBlack
5th rowBlue

Common Values

ValueCountFrequency (%)
Black418691
34.0%
White162441
 
13.2%
Blue161946
 
13.2%
Beige79527
 
6.5%
Grey75152
 
6.1%
Pink63831
 
5.2%
Red60190
 
4.9%
Green36371
 
3.0%
Khaki green34489
 
2.8%
Yellow29167
 
2.4%
Other values (10)108195
 
8.8%

Length

2022-11-10T16:02:49.102678image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
black418691
32.9%
white162441
 
12.8%
blue161946
 
12.7%
beige79527
 
6.3%
grey75152
 
5.9%
green70896
 
5.6%
pink63831
 
5.0%
red60190
 
4.7%
khaki34489
 
2.7%
yellow29167
 
2.3%
Other values (11)115855
 
9.1%

Most occurring characters

ValueCountFrequency (%)
e864167
14.2%
B684353
11.2%
l683122
11.2%
k527445
 
8.7%
a503764
 
8.3%
c426351
 
7.0%
i358006
 
5.9%
n219012
 
3.6%
r214154
 
3.5%
h196966
 
3.2%
Other values (23)1416525
23.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4814837
79.0%
Uppercase Letter1236843
 
20.3%
Space Separator42185
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e864167
17.9%
l683122
14.2%
k527445
11.0%
a503764
10.5%
c426351
8.9%
i358006
7.4%
n219012
 
4.5%
r214154
 
4.4%
h196966
 
4.1%
u188804
 
3.9%
Other values (10)633046
13.1%
Uppercase Letter
ValueCountFrequency (%)
B684353
55.3%
W162441
 
13.1%
G111559
 
9.0%
P71491
 
5.8%
R60190
 
4.9%
K34489
 
2.8%
Y29196
 
2.4%
M28766
 
2.3%
O27095
 
2.2%
U10434
 
0.8%
Other values (2)16829
 
1.4%
Space Separator
ValueCountFrequency (%)
42185
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6051680
99.3%
Common42185
 
0.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e864167
14.3%
B684353
11.3%
l683122
11.3%
k527445
 
8.7%
a503764
 
8.3%
c426351
 
7.0%
i358006
 
5.9%
n219012
 
3.6%
r214154
 
3.5%
h196966
 
3.3%
Other values (22)1374340
22.7%
Common
ValueCountFrequency (%)
42185
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6093865
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e864167
14.2%
B684353
11.2%
l683122
11.2%
k527445
 
8.7%
a503764
 
8.3%
c426351
 
7.0%
i358006
 
5.9%
n219012
 
3.6%
r214154
 
3.5%
h196966
 
3.2%
Other values (23)1416525
23.2%

department_no
Real number (ℝ≥0)

HIGH CORRELATION

Distinct298
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2906.439874
Minimum1201
Maximum9989
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.8 MiB
2022-11-10T16:02:49.246309image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1201
5-th percentile1322
Q11616
median1717
Q33948
95-th percentile8310
Maximum9989
Range8788
Interquartile range (IQR)2332

Descriptive statistics

Standard deviation2121.898698
Coefficient of variation (CV)0.7300679837
Kurtosis1.288360006
Mean2906.439874
Median Absolute Deviation (MAD)373
Skewness1.519970575
Sum3574921045
Variance4502454.086
MonotonicityNot monotonic
2022-11-10T16:02:49.398232image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
424292729
 
7.5%
167654663
 
4.4%
133845629
 
3.7%
172243805
 
3.6%
163643741
 
3.6%
164343577
 
3.5%
152234567
 
2.8%
162632437
 
2.6%
174730775
 
2.5%
132229159
 
2.4%
Other values (288)778918
63.3%
ValueCountFrequency (%)
120113673
1.1%
120232
 
< 0.1%
12126803
 
0.6%
12227763
 
0.6%
1241397
 
< 0.1%
12447565
 
0.6%
13103506
 
0.3%
13139541
 
0.8%
132229159
2.4%
133420912
1.7%
ValueCountFrequency (%)
9989602
 
< 0.1%
99861268
 
0.1%
99851541
0.1%
99841600
0.1%
902078
 
< 0.1%
89561656
0.1%
89171645
0.1%
88883627
0.3%
8852494
 
< 0.1%
881535
 
< 0.1%

department_name
Categorical

HIGH CARDINALITY

Distinct249
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Swimwear
94329 
Trouser
 
66248
Blouse
 
62716
Knitwear
 
60613
Jersey
 
57898
Other values (244)
888196 

Length

Max length40
Median length34
Mean length10.72118943
Min length2

Characters and Unicode

Total characters13187063
Distinct characters60
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowExpressive Lingerie
2nd rowKnitwear
3rd rowTops Fancy Jersey
4th rowYoung Girl Jersey Basic
5th rowKnitwear

Common Values

ValueCountFrequency (%)
Swimwear94329
 
7.7%
Trouser66248
 
5.4%
Blouse62716
 
5.1%
Knitwear60613
 
4.9%
Jersey57898
 
4.7%
Jersey Basic55688
 
4.5%
Expressive Lingerie45629
 
3.7%
Jersey fancy43741
 
3.6%
Basic 143577
 
3.5%
Dress42206
 
3.4%
Other values (239)657355
53.4%

Length

2022-11-10T16:02:49.569129image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
jersey235721
 
11.5%
basic134594
 
6.6%
swimwear96185
 
4.7%
fancy94223
 
4.6%
knitwear93269
 
4.5%
lingerie83901
 
4.1%
blouse69568
 
3.4%
trouser69107
 
3.4%
tops68281
 
3.3%
trousers64328
 
3.1%
Other values (132)1043029
50.8%

Most occurring characters

ValueCountFrequency (%)
e1699899
 
12.9%
s1414715
 
10.7%
r1252118
 
9.5%
i865421
 
6.6%
822206
 
6.2%
a736622
 
5.6%
o628710
 
4.8%
n471326
 
3.6%
t462198
 
3.5%
y394158
 
3.0%
Other values (50)4439690
33.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10319293
78.3%
Uppercase Letter1946923
 
14.8%
Space Separator822206
 
6.2%
Other Punctuation48727
 
0.4%
Decimal Number45138
 
0.3%
Math Symbol4312
 
< 0.1%
Dash Punctuation464
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1699899
16.5%
s1414715
13.7%
r1252118
12.1%
i865421
8.4%
a736622
 
7.1%
o628710
 
6.1%
n471326
 
4.6%
t462198
 
4.5%
y394158
 
3.8%
w342693
 
3.3%
Other values (16)2051433
19.9%
Uppercase Letter
ValueCountFrequency (%)
B309312
15.9%
S296954
15.3%
J253747
13.0%
T215706
11.1%
D158111
8.1%
L150740
7.7%
K119734
 
6.1%
F63771
 
3.3%
E62709
 
3.2%
W59850
 
3.1%
Other values (13)256289
13.2%
Decimal Number
ValueCountFrequency (%)
144664
98.9%
5314
 
0.7%
680
 
0.2%
278
 
0.2%
72
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
&28116
57.7%
/20226
41.5%
.385
 
0.8%
Space Separator
ValueCountFrequency (%)
822206
100.0%
Math Symbol
ValueCountFrequency (%)
+4312
100.0%
Dash Punctuation
ValueCountFrequency (%)
-464
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12266216
93.0%
Common920847
 
7.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1699899
13.9%
s1414715
 
11.5%
r1252118
 
10.2%
i865421
 
7.1%
a736622
 
6.0%
o628710
 
5.1%
n471326
 
3.8%
t462198
 
3.8%
y394158
 
3.2%
w342693
 
2.8%
Other values (39)3998356
32.6%
Common
ValueCountFrequency (%)
822206
89.3%
144664
 
4.9%
&28116
 
3.1%
/20226
 
2.2%
+4312
 
0.5%
-464
 
0.1%
.385
 
< 0.1%
5314
 
< 0.1%
680
 
< 0.1%
278
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII13187063
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1699899
 
12.9%
s1414715
 
10.7%
r1252118
 
9.5%
i865421
 
6.6%
822206
 
6.2%
a736622
 
5.6%
o628710
 
4.8%
n471326
 
3.6%
t462198
 
3.5%
y394158
 
3.0%
Other values (50)4439690
33.7%

index_code
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
A
496142 
D
273290 
B
212014 
C
72260 
F
70450 
Other values (5)
105844 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1230000
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowA
3rd rowD
4th rowI
5th rowA

Common Values

ValueCountFrequency (%)
A496142
40.3%
D273290
22.2%
B212014
17.2%
C72260
 
5.9%
F70450
 
5.7%
S47961
 
3.9%
I21369
 
1.7%
H18015
 
1.5%
G12828
 
1.0%
J5671
 
0.5%

Length

2022-11-10T16:02:49.701217image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:49.841720image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
a496142
40.3%
d273290
22.2%
b212014
17.2%
c72260
 
5.9%
f70450
 
5.7%
s47961
 
3.9%
i21369
 
1.7%
h18015
 
1.5%
g12828
 
1.0%
j5671
 
0.5%

Most occurring characters

ValueCountFrequency (%)
A496142
40.3%
D273290
22.2%
B212014
17.2%
C72260
 
5.9%
F70450
 
5.7%
S47961
 
3.9%
I21369
 
1.7%
H18015
 
1.5%
G12828
 
1.0%
J5671
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1230000
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A496142
40.3%
D273290
22.2%
B212014
17.2%
C72260
 
5.9%
F70450
 
5.7%
S47961
 
3.9%
I21369
 
1.7%
H18015
 
1.5%
G12828
 
1.0%
J5671
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Latin1230000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A496142
40.3%
D273290
22.2%
B212014
17.2%
C72260
 
5.9%
F70450
 
5.7%
S47961
 
3.9%
I21369
 
1.7%
H18015
 
1.5%
G12828
 
1.0%
J5671
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1230000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A496142
40.3%
D273290
22.2%
B212014
17.2%
C72260
 
5.9%
F70450
 
5.7%
S47961
 
3.9%
I21369
 
1.7%
H18015
 
1.5%
G12828
 
1.0%
J5671
 
0.5%

index_name
Categorical

HIGH CORRELATION

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Ladieswear
496142 
Divided
273290 
Lingeries/Tights
212014 
Ladies Accessories
72260 
Menswear
70450 
Other values (5)
105844 

Length

Max length30
Median length22
Mean length11.05249593
Min length5

Characters and Unicode

Total characters13594570
Distinct characters41
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLingeries/Tights
2nd rowLadieswear
3rd rowDivided
4th rowChildren Sizes 134-170
5th rowLadieswear

Common Values

ValueCountFrequency (%)
Ladieswear496142
40.3%
Divided273290
22.2%
Lingeries/Tights212014
17.2%
Ladies Accessories72260
 
5.9%
Menswear70450
 
5.7%
Sport47961
 
3.9%
Children Sizes 134-17021369
 
1.7%
Children Sizes 92-14018015
 
1.5%
Baby Sizes 50-9812828
 
1.0%
Children Accessories, Swimwear5671
 
0.5%

Length

2022-11-10T16:02:49.996164image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:50.146225image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
ladieswear496142
35.0%
divided273290
19.3%
lingeries/tights212014
15.0%
accessories77931
 
5.5%
ladies72260
 
5.1%
menswear70450
 
5.0%
sizes52212
 
3.7%
sport47961
 
3.4%
children45055
 
3.2%
134-17021369
 
1.5%
Other values (4)49342
 
3.5%

Most occurring characters

ValueCountFrequency (%)
e2161562
15.9%
i1931893
14.2%
s1348885
9.9%
d1160037
8.5%
a1153493
8.5%
r955224
 
7.0%
L780416
 
5.7%
w577934
 
4.3%
g424028
 
3.1%
n327519
 
2.4%
Other values (31)2773579
20.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter11289218
83.0%
Uppercase Letter1577828
 
11.6%
Decimal Number269601
 
2.0%
Other Punctuation217685
 
1.6%
Space Separator188026
 
1.4%
Dash Punctuation52212
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2161562
19.1%
i1931893
17.1%
s1348885
11.9%
d1160037
10.3%
a1153493
10.2%
r955224
8.5%
w577934
 
5.1%
g424028
 
3.8%
n327519
 
2.9%
v273290
 
2.4%
Other values (10)975353
8.6%
Decimal Number
ValueCountFrequency (%)
160753
22.5%
052212
19.4%
439384
14.6%
930843
11.4%
321369
 
7.9%
721369
 
7.9%
218015
 
6.7%
512828
 
4.8%
812828
 
4.8%
Uppercase Letter
ValueCountFrequency (%)
L780416
49.5%
D273290
 
17.3%
T212014
 
13.4%
S105844
 
6.7%
A77931
 
4.9%
M70450
 
4.5%
C45055
 
2.9%
B12828
 
0.8%
Other Punctuation
ValueCountFrequency (%)
/212014
97.4%
,5671
 
2.6%
Space Separator
ValueCountFrequency (%)
188026
100.0%
Dash Punctuation
ValueCountFrequency (%)
-52212
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12867046
94.6%
Common727524
 
5.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2161562
16.8%
i1931893
15.0%
s1348885
10.5%
d1160037
9.0%
a1153493
9.0%
r955224
7.4%
L780416
 
6.1%
w577934
 
4.5%
g424028
 
3.3%
n327519
 
2.5%
Other values (18)2046055
15.9%
Common
ValueCountFrequency (%)
/212014
29.1%
188026
25.8%
160753
 
8.4%
052212
 
7.2%
-52212
 
7.2%
439384
 
5.4%
930843
 
4.2%
321369
 
2.9%
721369
 
2.9%
218015
 
2.5%
Other values (3)31327
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII13594570
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2161562
15.9%
i1931893
14.2%
s1348885
9.9%
d1160037
8.5%
a1153493
8.5%
r955224
 
7.0%
L780416
 
5.7%
w577934
 
4.3%
g424028
 
3.1%
n327519
 
2.4%
Other values (31)2773579
20.4%

index_group_no
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
1
780416 
2
273290 
3
 
70450
4
 
57883
26
 
47961

Length

Max length2
Median length1
Mean length1.038992683
Min length1

Characters and Unicode

Total characters1277961
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row2
4th row4
5th row1

Common Values

ValueCountFrequency (%)
1780416
63.4%
2273290
 
22.2%
370450
 
5.7%
457883
 
4.7%
2647961
 
3.9%

Length

2022-11-10T16:02:50.301964image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:50.426224image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1780416
63.4%
2273290
 
22.2%
370450
 
5.7%
457883
 
4.7%
2647961
 
3.9%

Most occurring characters

ValueCountFrequency (%)
1780416
61.1%
2321251
25.1%
370450
 
5.5%
457883
 
4.5%
647961
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1277961
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1780416
61.1%
2321251
25.1%
370450
 
5.5%
457883
 
4.5%
647961
 
3.8%

Most occurring scripts

ValueCountFrequency (%)
Common1277961
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1780416
61.1%
2321251
25.1%
370450
 
5.5%
457883
 
4.5%
647961
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1277961
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1780416
61.1%
2321251
25.1%
370450
 
5.5%
457883
 
4.5%
647961
 
3.8%

index_group_name
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Ladieswear
780416 
Divided
273290 
Menswear
 
70450
Baby/Children
 
57883
Sport
 
47961

Length

Max length13
Median length10
Mean length9.165100813
Min length5

Characters and Unicode

Total characters11273074
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLadieswear
2nd rowLadieswear
3rd rowDivided
4th rowBaby/Children
5th rowLadieswear

Common Values

ValueCountFrequency (%)
Ladieswear780416
63.4%
Divided273290
 
22.2%
Menswear70450
 
5.7%
Baby/Children57883
 
4.7%
Sport47961
 
3.9%

Length

2022-11-10T16:02:50.552424image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:50.695396image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
ladieswear780416
63.4%
divided273290
 
22.2%
menswear70450
 
5.7%
baby/children57883
 
4.7%
sport47961
 
3.9%

Most occurring characters

ValueCountFrequency (%)
e2032905
18.0%
a1689165
15.0%
d1384879
12.3%
i1384879
12.3%
r956710
8.5%
s850866
7.5%
w850866
7.5%
L780416
 
6.9%
D273290
 
2.4%
v273290
 
2.4%
Other values (13)795808
 
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter9927308
88.1%
Uppercase Letter1287883
 
11.4%
Other Punctuation57883
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2032905
20.5%
a1689165
17.0%
d1384879
14.0%
i1384879
14.0%
r956710
9.6%
s850866
8.6%
w850866
8.6%
v273290
 
2.8%
n128333
 
1.3%
b57883
 
0.6%
Other values (6)317532
 
3.2%
Uppercase Letter
ValueCountFrequency (%)
L780416
60.6%
D273290
 
21.2%
M70450
 
5.5%
B57883
 
4.5%
C57883
 
4.5%
S47961
 
3.7%
Other Punctuation
ValueCountFrequency (%)
/57883
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin11215191
99.5%
Common57883
 
0.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2032905
18.1%
a1689165
15.1%
d1384879
12.3%
i1384879
12.3%
r956710
8.5%
s850866
7.6%
w850866
7.6%
L780416
 
7.0%
D273290
 
2.4%
v273290
 
2.4%
Other values (12)737925
 
6.6%
Common
ValueCountFrequency (%)
/57883
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII11273074
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2032905
18.0%
a1689165
15.0%
d1384879
12.3%
i1384879
12.3%
r956710
8.5%
s850866
7.5%
w850866
7.5%
L780416
 
6.9%
D273290
 
2.4%
v273290
 
2.4%
Other values (13)795808
 
7.1%

section_no
Real number (ℝ≥0)

HIGH CORRELATION

Distinct57
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.82913659
Minimum2
Maximum97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.8 MiB
2022-11-10T16:02:50.832347image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile5
Q115
median47
Q360
95-th percentile66
Maximum97
Range95
Interquartile range (IQR)45

Descriptive statistics

Standard deviation23.08040963
Coefficient of variation (CV)0.6266888602
Kurtosis-1.627263438
Mean36.82913659
Median Absolute Deviation (MAD)21
Skewness-0.01317993478
Sum45299838
Variance532.7053087
MonotonicityNot monotonic
2022-11-10T16:02:50.989477image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15221946
18.0%
53143474
 
11.7%
6092729
 
7.5%
6183901
 
6.8%
1180965
 
6.6%
1658506
 
4.8%
5149438
 
4.0%
646281
 
3.8%
544915
 
3.7%
6242374
 
3.4%
Other values (47)365471
29.7%
ValueCountFrequency (%)
221544
 
1.8%
422
 
< 0.1%
544915
 
3.7%
646281
 
3.8%
89788
 
0.8%
1180965
 
6.6%
144985
 
0.4%
15221946
18.0%
1658506
 
4.8%
1718
 
< 0.1%
ValueCountFrequency (%)
971861
 
0.2%
822044
 
0.2%
802309
 
0.2%
795835
 
0.5%
778561
 
0.7%
766789
 
0.6%
724039
 
0.3%
7168
 
< 0.1%
701165
 
0.1%
6628869
2.3%

section_name
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct56
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Womens Everyday Collection
221946 
Divided Collection
143474 
Womens Swimwear, beachwear
92729 
Womens Lingerie
83901 
Womens Tailoring
80965 
Other values (51)
606985 

Length

Max length30
Median length26
Mean length18.95364472
Min length4

Characters and Unicode

Total characters23312983
Distinct characters48
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWomens Lingerie
2nd rowWomens Everyday Collection
3rd rowDivided Collection
4th rowGirls Underwear & Basics
5th rowWomens Everyday Collection

Common Values

ValueCountFrequency (%)
Womens Everyday Collection221946
18.0%
Divided Collection143474
 
11.7%
Womens Swimwear, beachwear92729
 
7.5%
Womens Lingerie83901
 
6.8%
Womens Tailoring80965
 
6.6%
Womens Everyday Basics58506
 
4.8%
Divided Basics49438
 
4.0%
Womens Casual46281
 
3.8%
Ladies H&M Sport44915
 
3.7%
Womens Nightwear, Socks & Tigh42374
 
3.4%
Other values (46)365471
29.7%

Length

2022-11-10T16:02:51.155508image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
womens739904
24.0%
collection365420
11.8%
everyday280452
 
9.1%
divided238577
 
7.7%
basics117818
 
3.8%
swimwear95346
 
3.1%
beachwear92729
 
3.0%
tailoring87204
 
2.8%
ladies84903
 
2.8%
lingerie83901
 
2.7%
Other values (49)899695
29.2%

Most occurring characters

ValueCountFrequency (%)
e2706043
 
11.6%
o1863408
 
8.0%
1855949
 
8.0%
i1770964
 
7.6%
s1493935
 
6.4%
n1489102
 
6.4%
a1218824
 
5.2%
r1019824
 
4.4%
l992992
 
4.3%
m984603
 
4.2%
Other values (38)7917339
34.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter18216520
78.1%
Uppercase Letter2941677
 
12.6%
Space Separator1855949
 
8.0%
Other Punctuation277107
 
1.2%
Math Symbol21544
 
0.1%
Decimal Number186
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2706043
14.9%
o1863408
10.2%
i1770964
9.7%
s1493935
 
8.2%
n1489102
 
8.2%
a1218824
 
6.7%
r1019824
 
5.6%
l992992
 
5.5%
m984603
 
5.4%
d935798
 
5.1%
Other values (12)3741027
20.5%
Uppercase Letter
ValueCountFrequency (%)
W739904
25.2%
C457760
15.6%
E287939
 
9.8%
D286680
 
9.7%
S280092
 
9.5%
B170310
 
5.8%
L169301
 
5.8%
T151862
 
5.2%
M126137
 
4.3%
H68710
 
2.3%
Other values (11)202982
 
6.9%
Other Punctuation
ValueCountFrequency (%)
&139387
50.3%
,137720
49.7%
Space Separator
ValueCountFrequency (%)
1855949
100.0%
Math Symbol
ValueCountFrequency (%)
+21544
100.0%
Decimal Number
ValueCountFrequency (%)
2186
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin21158197
90.8%
Common2154786
 
9.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2706043
12.8%
o1863408
 
8.8%
i1770964
 
8.4%
s1493935
 
7.1%
n1489102
 
7.0%
a1218824
 
5.8%
r1019824
 
4.8%
l992992
 
4.7%
m984603
 
4.7%
d935798
 
4.4%
Other values (33)6682704
31.6%
Common
ValueCountFrequency (%)
1855949
86.1%
&139387
 
6.5%
,137720
 
6.4%
+21544
 
1.0%
2186
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII23312983
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2706043
 
11.6%
o1863408
 
8.0%
1855949
 
8.0%
i1770964
 
7.6%
s1493935
 
6.4%
n1489102
 
6.4%
a1218824
 
5.2%
r1019824
 
4.4%
l992992
 
4.3%
m984603
 
4.2%
Other values (38)7917339
34.0%

garment_group_no
Real number (ℝ≥0)

HIGH CORRELATION

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1010.743174
Minimum1001
Maximum1025
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.8 MiB
2022-11-10T16:02:51.277544image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1001
5-th percentile1002
Q11005
median1010
Q31017
95-th percentile1021
Maximum1025
Range24
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.562653437
Coefficient of variation (CV)0.006492899092
Kurtosis-1.181688205
Mean1010.743174
Median Absolute Deviation (MAD)6
Skewness0.2385878965
Sum1243214104
Variance43.06842013
MonotonicityNot monotonic
2022-11-10T16:02:51.400921image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
1005200722
16.3%
1002126038
10.2%
1017114126
9.3%
1009112538
9.1%
101896185
7.8%
101095802
7.8%
100391639
7.5%
101381614
6.6%
101969042
 
5.6%
101648954
 
4.0%
Other values (11)193340
15.7%
ValueCountFrequency (%)
100115739
 
1.3%
1002126038
10.2%
100391639
7.5%
1005200722
16.3%
10062918
 
0.2%
100728271
 
2.3%
100815791
 
1.3%
1009112538
9.1%
101095802
7.8%
10118463
 
0.7%
ValueCountFrequency (%)
102528396
 
2.3%
10238910
 
0.7%
102128964
 
2.4%
102029441
 
2.4%
101969042
5.6%
101896185
7.8%
1017114126
9.3%
101648954
4.0%
10142650
 
0.2%
101381614
6.6%

garment_group_name
Categorical

HIGH CORRELATION

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
Jersey Fancy
200722 
Jersey Basic
126038 
Under-, Nightwear
114126 
Trousers
112538 
Swimwear
96185 
Other values (16)
580391 

Length

Max length29
Median length20
Mean length10.7135935
Min length5

Characters and Unicode

Total characters13177720
Distinct characters40
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnder-, Nightwear
2nd rowKnitwear
3rd rowJersey Fancy
4th rowJersey Basic
5th rowKnitwear

Common Values

ValueCountFrequency (%)
Jersey Fancy200722
16.3%
Jersey Basic126038
10.2%
Under-, Nightwear114126
9.3%
Trousers112538
9.1%
Swimwear96185
7.8%
Blouses95802
7.8%
Knitwear91639
7.5%
Dresses Ladies81614
6.6%
Accessories69042
 
5.6%
Trousers Denim48954
 
4.0%
Other values (11)193340
15.7%

Length

2022-11-10T16:02:51.534288image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
jersey326760
17.4%
fancy200722
10.7%
trousers161492
 
8.6%
basic126038
 
6.7%
under114126
 
6.1%
nightwear114126
 
6.1%
swimwear96185
 
5.1%
blouses95802
 
5.1%
knitwear91639
 
4.9%
dresses81614
 
4.3%
Other values (20)468274
25.0%

Most occurring characters

ValueCountFrequency (%)
e1854585
14.1%
s1705653
12.9%
r1340972
 
10.2%
a751116
 
5.7%
i708868
 
5.4%
646778
 
4.9%
n537458
 
4.1%
y533318
 
4.0%
c502718
 
3.8%
o488336
 
3.7%
Other values (30)4107918
31.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10443472
79.3%
Uppercase Letter1850732
 
14.0%
Space Separator646778
 
4.9%
Other Punctuation122612
 
0.9%
Dash Punctuation114126
 
0.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1854585
17.8%
s1705653
16.3%
r1340972
12.8%
a751116
7.2%
i708868
 
6.8%
n537458
 
5.1%
y533318
 
5.1%
c502718
 
4.8%
o488336
 
4.7%
w413874
 
4.0%
Other values (13)1606574
15.4%
Uppercase Letter
ValueCountFrequency (%)
J329678
17.8%
S226806
12.3%
B224758
12.1%
F200722
10.8%
T190456
10.3%
D149009
8.1%
U129865
 
7.0%
N114126
 
6.2%
K94557
 
5.1%
L81614
 
4.4%
Other values (3)109141
 
5.9%
Other Punctuation
ValueCountFrequency (%)
,114126
93.1%
/8486
 
6.9%
Space Separator
ValueCountFrequency (%)
646778
100.0%
Dash Punctuation
ValueCountFrequency (%)
-114126
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12294204
93.3%
Common883516
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1854585
15.1%
s1705653
13.9%
r1340972
 
10.9%
a751116
 
6.1%
i708868
 
5.8%
n537458
 
4.4%
y533318
 
4.3%
c502718
 
4.1%
o488336
 
4.0%
w413874
 
3.4%
Other values (26)3457306
28.1%
Common
ValueCountFrequency (%)
646778
73.2%
,114126
 
12.9%
-114126
 
12.9%
/8486
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII13177720
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1854585
14.1%
s1705653
12.9%
r1340972
 
10.2%
a751116
 
5.7%
i708868
 
5.4%
646778
 
4.9%
n537458
 
4.1%
y533318
 
4.0%
c502718
 
3.8%
o488336
 
3.7%
Other values (30)4107918
31.2%

detail_desc
Categorical

HIGH CARDINALITY

Distinct36009
Distinct (%)2.9%
Missing4475
Missing (%)0.4%
Memory size18.8 MiB
High-waisted jeans in washed superstretch denim with a zip fly and button, fake front pockets, real back pockets and super-skinny legs.
 
8443
5-pocket jeans in washed, superstretch denim with a regular waist, zip fly and button, and skinny legs.
 
6015
T-shirt in lightweight jersey with a rounded hem. Slightly longer at the back.
 
5279
Fully lined bikini bottoms with a mid waist and medium coverage at the back.
 
4784
Blouse in a soft weave with a narrow collar, concealed buttons down the front, long sleeves with buttoned cuffs and a rounded hem.
 
4325
Other values (36004)
1196679 

Length

Max length698
Median length441
Mean length135.7135081
Min length11

Characters and Unicode

Total characters166320297
Distinct characters92
Distinct categories14 ?
Distinct scripts2 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5777 ?
Unique (%)0.5%

Sample

1st rowLace push-up bra with underwired, moulded, padded cups for a larger bust and fuller cleavage. Narrow, adjustable shoulder straps and a narrow fastening at the back with two pairs of hooks and eyes.
2nd rowLong polo-neck jumper in a soft knit with long raglan sleeves and ribbing at the cuffs and hem.
3rd rowCropped top in airy, fluted jersey with narrow, adjustable shoulder straps, buttons down the front and narrow, covered elastication and a tie detail at the hem.
4th rowLong-sleeved dress in cotton jersey with a seam at the waist and bell-shaped skirt.
5th rowWide, long-sleeved jumper in a soft, rib knit containing some wool.

Common Values

ValueCountFrequency (%)
High-waisted jeans in washed superstretch denim with a zip fly and button, fake front pockets, real back pockets and super-skinny legs.8443
 
0.7%
5-pocket jeans in washed, superstretch denim with a regular waist, zip fly and button, and skinny legs.6015
 
0.5%
T-shirt in lightweight jersey with a rounded hem. Slightly longer at the back.5279
 
0.4%
Fully lined bikini bottoms with a mid waist and medium coverage at the back.4784
 
0.4%
Blouse in a soft weave with a narrow collar, concealed buttons down the front, long sleeves with buttoned cuffs and a rounded hem.4325
 
0.4%
T-shirt in soft jersey.3082
 
0.3%
Fine-knit trainer socks in a soft cotton blend.2855
 
0.2%
Lined, non-wired, triangle bikini top with a wide hem. Narrow, adjustable shoulder straps that can be fastened in different ways at the back and cups with removable inserts that shape the bust and provide good support. No fasteners.2786
 
0.2%
Ankle-length cigarette trousers in a stretch weave with a regular waist, concealed zip in one side, fake back pockets and tapered legs with slits at the hems.2699
 
0.2%
Round-necked T-shirt in soft cotton jersey.2686
 
0.2%
Other values (35999)1182571
96.1%
(Missing)4475
 
0.4%

Length

2022-11-10T16:02:51.698685image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and1761811
 
6.3%
a1727004
 
6.2%
with1676940
 
6.0%
the1484849
 
5.3%
in1162727
 
4.2%
at921066
 
3.3%
back505927
 
1.8%
waist440039
 
1.6%
top357519
 
1.3%
front355346
 
1.3%
Other values (4514)17620823
62.9%

Most occurring characters

ValueCountFrequency (%)
26788763
16.1%
e14914610
 
9.0%
t13469186
 
8.1%
a11462837
 
6.9%
n9795860
 
5.9%
i9678927
 
5.8%
s9256646
 
5.6%
o7392234
 
4.4%
d6840616
 
4.1%
r6815278
 
4.1%
Other values (82)49905340
30.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter131338416
79.0%
Space Separator26788763
 
16.1%
Other Punctuation4064909
 
2.4%
Uppercase Letter2410808
 
1.4%
Dash Punctuation1359658
 
0.8%
Decimal Number323393
 
0.2%
Open Punctuation9757
 
< 0.1%
Close Punctuation9757
 
< 0.1%
Other Symbol8382
 
< 0.1%
Final Punctuation5808
 
< 0.1%
Other values (4)646
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e14914610
11.4%
t13469186
 
10.3%
a11462837
 
8.7%
n9795860
 
7.5%
i9678927
 
7.4%
s9256646
 
7.0%
o7392234
 
5.6%
d6840616
 
5.2%
r6815278
 
5.2%
h6302467
 
4.8%
Other values (19)35409755
27.0%
Uppercase Letter
ValueCountFrequency (%)
S529115
21.9%
L263848
10.9%
T242186
10.0%
V179801
 
7.5%
F175722
 
7.3%
U103572
 
4.3%
C99846
 
4.1%
B86644
 
3.6%
A85612
 
3.6%
H82815
 
3.4%
Other values (17)561647
23.3%
Other Punctuation
ValueCountFrequency (%)
.2219772
54.6%
,1815388
44.7%
/25071
 
0.6%
&3411
 
0.1%
%969
 
< 0.1%
:126
 
< 0.1%
'103
 
< 0.1%
!43
 
< 0.1%
"20
 
< 0.1%
?6
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
5110261
34.1%
347091
14.6%
442421
 
13.1%
130094
 
9.3%
229494
 
9.1%
025098
 
7.8%
812978
 
4.0%
610217
 
3.2%
79086
 
2.8%
96653
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
-1344936
98.9%
14712
 
1.1%
10
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
7597
90.6%
®756
 
9.0%
°29
 
0.3%
Final Punctuation
ValueCountFrequency (%)
5781
99.5%
27
 
0.5%
Initial Punctuation
ValueCountFrequency (%)
3
60.0%
2
40.0%
Space Separator
ValueCountFrequency (%)
26788763
100.0%
Open Punctuation
ValueCountFrequency (%)
(9757
100.0%
Close Punctuation
ValueCountFrequency (%)
)9757
100.0%
Other Number
ValueCountFrequency (%)
½608
100.0%
Math Symbol
ValueCountFrequency (%)
+29
100.0%
Modifier Symbol
ValueCountFrequency (%)
´4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin133749224
80.4%
Common32571073
 
19.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e14914610
 
11.2%
t13469186
 
10.1%
a11462837
 
8.6%
n9795860
 
7.3%
i9678927
 
7.2%
s9256646
 
6.9%
o7392234
 
5.5%
d6840616
 
5.1%
r6815278
 
5.1%
h6302467
 
4.7%
Other values (46)37820563
28.3%
Common
ValueCountFrequency (%)
26788763
82.2%
.2219772
 
6.8%
,1815388
 
5.6%
-1344936
 
4.1%
5110261
 
0.3%
347091
 
0.1%
442421
 
0.1%
130094
 
0.1%
229494
 
0.1%
025098
 
0.1%
Other values (26)117755
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII166244942
> 99.9%
None47223
 
< 0.1%
Punctuation20535
 
< 0.1%
Letterlike Symbols7597
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
26788763
16.1%
e14914610
 
9.0%
t13469186
 
8.1%
a11462837
 
6.9%
n9795860
 
5.9%
i9678927
 
5.8%
s9256646
 
5.6%
o7392234
 
4.4%
d6840616
 
4.1%
r6815278
 
4.1%
Other values (67)49829985
30.0%
None
ValueCountFrequency (%)
ê38482
81.5%
é7124
 
15.1%
®756
 
1.6%
½608
 
1.3%
É212
 
0.4%
°29
 
0.1%
ñ8
 
< 0.1%
´4
 
< 0.1%
Punctuation
ValueCountFrequency (%)
14712
71.6%
5781
 
28.2%
27
 
0.1%
10
 
< 0.1%
3
 
< 0.1%
2
 
< 0.1%
Letterlike Symbols
ValueCountFrequency (%)
7597
100.0%

FN
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing708745
Missing (%)57.6%
Memory size18.8 MiB
1.0
521255 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1563765
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0521255
42.4%
(Missing)708745
57.6%

Length

2022-11-10T16:02:51.838481image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:51.944304image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0521255
100.0%

Most occurring characters

ValueCountFrequency (%)
1521255
33.3%
.521255
33.3%
0521255
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1042510
66.7%
Other Punctuation521255
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1521255
50.0%
0521255
50.0%
Other Punctuation
ValueCountFrequency (%)
.521255
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1563765
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1521255
33.3%
.521255
33.3%
0521255
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1563765
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1521255
33.3%
.521255
33.3%
0521255
33.3%

Active
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing716695
Missing (%)58.3%
Memory size18.8 MiB
1.0
513305 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1539915
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0513305
41.7%
(Missing)716695
58.3%

Length

2022-11-10T16:02:52.032838image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:52.138353image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0513305
100.0%

Most occurring characters

ValueCountFrequency (%)
1513305
33.3%
.513305
33.3%
0513305
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1026610
66.7%
Other Punctuation513305
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1513305
50.0%
0513305
50.0%
Other Punctuation
ValueCountFrequency (%)
.513305
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1539915
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1513305
33.3%
.513305
33.3%
0513305
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1539915
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1513305
33.3%
.513305
33.3%
0513305
33.3%

club_member_status
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing2527
Missing (%)0.2%
Memory size18.8 MiB
ACTIVE
1199719 
PRE-CREATE
 
27423
LEFT CLUB
 
331

Length

Max length10
Median length6
Mean length6.090173063
Min length6

Characters and Unicode

Total characters7475523
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowACTIVE
2nd rowACTIVE
3rd rowACTIVE
4th rowACTIVE
5th rowACTIVE

Common Values

ValueCountFrequency (%)
ACTIVE1199719
97.5%
PRE-CREATE27423
 
2.2%
LEFT CLUB331
 
< 0.1%
(Missing)2527
 
0.2%

Length

2022-11-10T16:02:52.235387image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:52.367665image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
active1199719
97.7%
pre-create27423
 
2.2%
left331
 
< 0.1%
club331
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
E1282319
17.2%
C1227473
16.4%
T1227473
16.4%
A1227142
16.4%
I1199719
16.0%
V1199719
16.0%
R54846
 
0.7%
P27423
 
0.4%
-27423
 
0.4%
L662
 
< 0.1%
Other values (4)1324
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter7447769
99.6%
Dash Punctuation27423
 
0.4%
Space Separator331
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E1282319
17.2%
C1227473
16.5%
T1227473
16.5%
A1227142
16.5%
I1199719
16.1%
V1199719
16.1%
R54846
 
0.7%
P27423
 
0.4%
L662
 
< 0.1%
F331
 
< 0.1%
Other values (2)662
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
-27423
100.0%
Space Separator
ValueCountFrequency (%)
331
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7447769
99.6%
Common27754
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
E1282319
17.2%
C1227473
16.5%
T1227473
16.5%
A1227142
16.5%
I1199719
16.1%
V1199719
16.1%
R54846
 
0.7%
P27423
 
0.4%
L662
 
< 0.1%
F331
 
< 0.1%
Other values (2)662
 
< 0.1%
Common
ValueCountFrequency (%)
-27423
98.8%
331
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII7475523
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E1282319
17.2%
C1227473
16.4%
T1227473
16.4%
A1227142
16.4%
I1199719
16.0%
V1199719
16.0%
R54846
 
0.7%
P27423
 
0.4%
-27423
 
0.4%
L662
 
< 0.1%
Other values (4)1324
 
< 0.1%

fashion_news_frequency
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing5743
Missing (%)0.5%
Memory size18.8 MiB
NONE
701463 
Regularly
522367 
Monthly
 
427

Length

Max length9
Median length4
Mean length6.134450528
Min length4

Characters and Unicode

Total characters7510144
Distinct characters16
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNONE
2nd rowRegularly
3rd rowRegularly
4th rowNONE
5th rowRegularly

Common Values

ValueCountFrequency (%)
NONE701463
57.0%
Regularly522367
42.5%
Monthly427
 
< 0.1%
(Missing)5743
 
0.5%

Length

2022-11-10T16:02:52.476306image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-10T16:02:52.632410image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
none701463
57.3%
regularly522367
42.7%
monthly427
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N1402926
18.7%
l1045161
13.9%
O701463
9.3%
E701463
9.3%
y522794
 
7.0%
R522367
 
7.0%
e522367
 
7.0%
g522367
 
7.0%
u522367
 
7.0%
a522367
 
7.0%
Other values (6)524502
 
7.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4181498
55.7%
Uppercase Letter3328646
44.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l1045161
25.0%
y522794
12.5%
e522367
12.5%
g522367
12.5%
u522367
12.5%
a522367
12.5%
r522367
12.5%
o427
 
< 0.1%
n427
 
< 0.1%
t427
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
N1402926
42.1%
O701463
21.1%
E701463
21.1%
R522367
 
15.7%
M427
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin7510144
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N1402926
18.7%
l1045161
13.9%
O701463
9.3%
E701463
9.3%
y522794
 
7.0%
R522367
 
7.0%
e522367
 
7.0%
g522367
 
7.0%
u522367
 
7.0%
a522367
 
7.0%
Other values (6)524502
 
7.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII7510144
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N1402926
18.7%
l1045161
13.9%
O701463
9.3%
E701463
9.3%
y522794
 
7.0%
R522367
 
7.0%
e522367
 
7.0%
g522367
 
7.0%
u522367
 
7.0%
a522367
 
7.0%
Other values (6)524502
 
7.0%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct83
Distinct (%)< 0.1%
Missing5741
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean36.06006164
Minimum16
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size18.8 MiB
2022-11-10T16:02:52.773975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum16
5-th percentile21
Q125
median31
Q347
95-th percentile59
Maximum99
Range83
Interquartile range (IQR)22

Descriptive statistics

Standard deviation13.01484531
Coefficient of variation (CV)0.3609213274
Kurtosis-0.6333019993
Mean36.06006164
Median Absolute Deviation (MAD)8
Skewness0.6609682013
Sum44146855
Variance169.3861985
MonotonicityNot monotonic
2022-11-10T16:02:52.928537image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2560940
 
5.0%
2660857
 
4.9%
2457759
 
4.7%
2756890
 
4.6%
2352069
 
4.2%
2851533
 
4.2%
2947786
 
3.9%
3045030
 
3.7%
2243149
 
3.5%
2142828
 
3.5%
Other values (73)705418
57.4%
ValueCountFrequency (%)
1669
 
< 0.1%
172407
 
0.2%
187976
 
0.6%
1917206
 
1.4%
2030427
2.5%
2142828
3.5%
2243149
3.5%
2352069
4.2%
2457759
4.7%
2560940
5.0%
ValueCountFrequency (%)
993
 
< 0.1%
984
 
< 0.1%
971
 
< 0.1%
954
 
< 0.1%
945
 
< 0.1%
935
 
< 0.1%
929
< 0.1%
9114
< 0.1%
9012
< 0.1%
894
 
< 0.1%

postal_code
Categorical

HIGH CARDINALITY

Distinct254541
Distinct (%)20.7%
Missing0
Missing (%)0.0%
Memory size18.8 MiB
2c29ae653a9282cce4151bd87643c907644e09541abc28ae87dea0d1f6603b1c
 
26486
5b7eb31eabebd3277de632b82267286d847fd5d44287ee150bb4206b48439145
 
229
7c1fa3b0ec1d37ce2c3f34f63bd792f3b4494f324b6be5d1e4ba6a75456b96a7
 
220
1f5bd429acc88fbbf24de844a59e438704aa8761bc7b99fd977cad297c50b74c
 
206
a5ca21aefc3cf90afd9b09faf3b0f8f3c423d4f1cfb4c2e33a1b86770e426fa8
 
206
Other values (254536)
1202653 

Length

Max length64
Median length64
Mean length64
Min length64

Characters and Unicode

Total characters78720000
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique64561 ?
Unique (%)5.2%

Sample

1st row2fb862f50c007c58b21e045956ca10469feccd2dbe7ccc81e596dc5d926992f3
2nd rowf083eb09535f454fe68dfcae389b759cf9b71ff45271cb0a1a99996a9a6be1e6
3rd row624cb91bb12c602f4ce8165bb2b16af165ee85b4ee06add41f0d6a5fa61038d0
4th rowaa49a6081cd770489f90dfdf8677a1957110e505320d923bf4e913686569a8bb
5th rowc84181c2d1711d7b9e76fcea1f045409a608320eaad5338d096196a92a8c1785

Common Values

ValueCountFrequency (%)
2c29ae653a9282cce4151bd87643c907644e09541abc28ae87dea0d1f6603b1c26486
 
2.2%
5b7eb31eabebd3277de632b82267286d847fd5d44287ee150bb4206b48439145229
 
< 0.1%
7c1fa3b0ec1d37ce2c3f34f63bd792f3b4494f324b6be5d1e4ba6a75456b96a7220
 
< 0.1%
1f5bd429acc88fbbf24de844a59e438704aa8761bc7b99fd977cad297c50b74c206
 
< 0.1%
a5ca21aefc3cf90afd9b09faf3b0f8f3c423d4f1cfb4c2e33a1b86770e426fa8206
 
< 0.1%
2790324c84cdb8ba471be2a199cfb5103bbe1ab10883a0312b6928b05d6ee6c4176
 
< 0.1%
a1959a16bf167858c93a66ec2a330644512b25fb10f97eee2058549885af4dbd173
 
< 0.1%
9d5787501bf1c77592156ba51eab13f4a2670c807686431a9e22a69090b02358172
 
< 0.1%
cc4ed85e30f4977dae47662ddc468cd2eec11472de6fac5ec985080fd92243c8163
 
< 0.1%
3eb41c8511d4e04fc0f02452e6e15d206d0c0e9d0f25ff79aeeea7f62561d5a5143
 
< 0.1%
Other values (254531)1201826
97.7%

Length

2022-11-10T16:02:53.096588image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2c29ae653a9282cce4151bd87643c907644e09541abc28ae87dea0d1f6603b1c26486
 
2.2%
5b7eb31eabebd3277de632b82267286d847fd5d44287ee150bb4206b48439145229
 
< 0.1%
7c1fa3b0ec1d37ce2c3f34f63bd792f3b4494f324b6be5d1e4ba6a75456b96a7220
 
< 0.1%
a5ca21aefc3cf90afd9b09faf3b0f8f3c423d4f1cfb4c2e33a1b86770e426fa8206
 
< 0.1%
1f5bd429acc88fbbf24de844a59e438704aa8761bc7b99fd977cad297c50b74c206
 
< 0.1%
2790324c84cdb8ba471be2a199cfb5103bbe1ab10883a0312b6928b05d6ee6c4176
 
< 0.1%
a1959a16bf167858c93a66ec2a330644512b25fb10f97eee2058549885af4dbd173
 
< 0.1%
9d5787501bf1c77592156ba51eab13f4a2670c807686431a9e22a69090b02358172
 
< 0.1%
cc4ed85e30f4977dae47662ddc468cd2eec11472de6fac5ec985080fd92243c8163
 
< 0.1%
3eb41c8511d4e04fc0f02452e6e15d206d0c0e9d0f25ff79aeeea7f62561d5a5143
 
< 0.1%
Other values (254531)1201826
97.7%

Most occurring characters

ValueCountFrequency (%)
c4965077
 
6.3%
a4958163
 
6.3%
e4953247
 
6.3%
64948682
 
6.3%
44946499
 
6.3%
24941698
 
6.3%
14934616
 
6.3%
84926926
 
6.3%
94916596
 
6.2%
04913973
 
6.2%
Other values (6)29314523
37.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number49221446
62.5%
Lowercase Letter29498554
37.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
64948682
10.1%
44946499
10.0%
24941698
10.0%
14934616
10.0%
84926926
10.0%
94916596
10.0%
04913973
10.0%
34902163
10.0%
74899189
10.0%
54891104
9.9%
Lowercase Letter
ValueCountFrequency (%)
c4965077
16.8%
a4958163
16.8%
e4953247
16.8%
b4897737
16.6%
d4888303
16.6%
f4836027
16.4%

Most occurring scripts

ValueCountFrequency (%)
Common49221446
62.5%
Latin29498554
37.5%

Most frequent character per script

Common
ValueCountFrequency (%)
64948682
10.1%
44946499
10.0%
24941698
10.0%
14934616
10.0%
84926926
10.0%
94916596
10.0%
04913973
10.0%
34902163
10.0%
74899189
10.0%
54891104
9.9%
Latin
ValueCountFrequency (%)
c4965077
16.8%
a4958163
16.8%
e4953247
16.8%
b4897737
16.6%
d4888303
16.6%
f4836027
16.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII78720000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c4965077
 
6.3%
a4958163
 
6.3%
e4953247
 
6.3%
64948682
 
6.3%
44946499
 
6.3%
24941698
 
6.3%
14934616
 
6.3%
84926926
 
6.3%
94916596
 
6.2%
04913973
 
6.2%
Other values (6)29314523
37.2%

Interactions

2022-11-10T16:02:11.031532image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:00:54.231569image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:01.782173image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:08.615216image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:15.466459image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:22.713575image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:29.386655image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:36.313121image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:43.364221image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:50.272989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:57.026323image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:03.984443image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:11.601925image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:00:54.856882image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:02.345924image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:09.191450image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:16.045385image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:23.253512image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:29.960345image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:36.836846image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:43.929516image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:50.840113image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:57.605432image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:04.542747image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:12.184543image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:00:55.412577image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:02.933652image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:09.744430image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:16.629744image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:23.805650image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:30.523234image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:37.369612image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:44.495392image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:51.401810image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:58.169886image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:05.146587image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:12.755237image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:00:55.956574image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:03.489187image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:10.290456image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:17.211518image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:24.340402image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:31.083339image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:37.877418image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:45.054969image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:51.948158image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:58.733452image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:05.670434image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:13.353414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:00:56.508106image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:04.057389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:10.855517image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:17.818400image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:24.888860image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:31.645959image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:38.403105image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:45.622734image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:52.515152image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:59.312973image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:06.257664image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:13.938042image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:00:57.090469image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:04.630447image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:11.425611image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:18.616090image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:25.437080image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:32.200262image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:38.966503image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:46.199678image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:53.085372image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:59.933275image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:06.853498image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:14.528372image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:00:57.625375image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:05.181171image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:11.997169image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:19.215440image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:25.968276image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:32.792576image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:39.522674image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:46.759513image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:53.643173image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:00.526412image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:07.664321image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:15.122199image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:00:58.208727image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:05.740952image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:12.597389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:19.796413image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:26.507235image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:33.414465image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:40.391423image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:47.330304image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:54.215393image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:01.107999image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:08.254420image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:15.710697image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:00:58.779249image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:06.293667image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:13.174826image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:20.384512image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:27.060486image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:34.014388image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:40.967855image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:47.898506image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:54.758475image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:01.669883image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:08.809484image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:16.287244image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:00.002213image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:06.864173image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:13.738234image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:20.989855image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:27.620702image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:34.605036image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:41.537522image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:48.448184image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:55.315521image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:02.207530image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:09.359648image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:16.893522image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:00.577493image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:07.455414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:14.302598image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:21.577480image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:28.217389image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:35.180108image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:42.125282image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:49.064554image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:55.876399image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:02.810692image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:09.886868image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:17.469820image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:01.210220image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:08.055331image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:14.903508image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:22.182381image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:28.827261image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:35.817401image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:42.729531image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:49.683197image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:01:56.465394image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:03.455714image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-10T16:02:10.451217image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-10T16:02:53.253179image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-10T16:02:53.626848image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-10T16:02:53.903452image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-10T16:02:54.179454image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-10T16:02:54.454567image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-10T16:02:54.745434image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-10T16:02:20.893063image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-10T16:02:27.145505image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-10T16:02:35.825656image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-11-10T16:02:40.278603image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

customer_idarticle_idpricesales_channel_idsaleproduct_codeprod_nameproduct_type_noproduct_type_nameproduct_group_namegraphical_appearance_nographical_appearance_namecolour_group_codecolour_group_nameperceived_colour_value_idperceived_colour_value_nameperceived_colour_master_idperceived_colour_master_namedepartment_nodepartment_nameindex_codeindex_nameindex_group_noindex_group_namesection_nosection_namegarment_group_nogarment_group_namedetail_descFNActiveclub_member_statusfashion_news_frequencyagepostal_code
0f05a521a2649a53841d0c5c837efb1d48e2eff7a6f6e47f94f0e21665d7adaa35290080440.0242882yes529008Hazelnut Push Melbourne306BraUnderwear1010016Solid31Light Orange1Dusty Light3Orange1338Expressive LingerieBLingeries/Tights1Ladieswear61Womens Lingerie1017Under-, NightwearLace push-up bra with underwired, moulded, padded cups for a larger bust and fuller cleavage. Narrow, adjustable shoulder straps and a narrow fastening at the back with two pairs of hooks and eyes.NaNNaNACTIVENONE34.02fb862f50c007c58b21e045956ca10469feccd2dbe7ccc81e596dc5d926992f3
158afa373cb889cda30831ba3ca728bbb4147d5c1f3d19060f003bf5713d7f4f55376880140.0406612yes537688Rachel252SweaterGarment Upper body1010010Melange8Dark Grey4Dark12Grey1626KnitwearALadieswear1Ladieswear15Womens Everyday Collection1003KnitwearLong polo-neck jumper in a soft knit with long raglan sleeves and ribbing at the cuffs and hem.1.01.0ACTIVERegularly29.0f083eb09535f454fe68dfcae389b759cf9b71ff45271cb0a1a99996a9a6be1e6
2317ea97640e31f706565f2b61f17652ac569f05c1abc47fdf9fb2c4b446ca3438722980010.0060851yes872298Bonina loose tank253Vest topGarment Upper body1010016Solid10White3Light9White1640Tops Fancy JerseyDDivided2Divided53Divided Collection1005Jersey FancyCropped top in airy, fluted jersey with narrow, adjustable shoulder straps, buttons down the front and narrow, covered elastication and a tie detail at the hem.1.01.0ACTIVERegularly40.0624cb91bb12c602f4ce8165bb2b16af165ee85b4ee06add41f0d6a5fa61038d0
36559a47c9760bc36d3f7a7497306daa1ea9ce4a3a340a0abfe07325b76f4cd1e5624550020.0254072yes562455Edit fancy dress265DressGarment Full body1010001All over pattern9Black4Dark5Black7930Young Girl Jersey BasicIChildren Sizes 134-1704Baby/Children79Girls Underwear & Basics1002Jersey BasicLong-sleeved dress in cotton jersey with a seam at the waist and bell-shaped skirt.NaNNaNACTIVENONE27.0aa49a6081cd770489f90dfdf8677a1957110e505320d923bf4e913686569a8bb
410292f992bbf7a999f8f2eee6c1b2de299ee1279e369223b73c8baf6d65fce215041540340.0152372yes504154Lady Di252SweaterGarment Upper body1010016Solid73Dark Blue4Dark2Blue1626KnitwearALadieswear1Ladieswear15Womens Everyday Collection1003KnitwearWide, long-sleeved jumper in a soft, rib knit containing some wool.1.0NaNACTIVERegularly61.0c84181c2d1711d7b9e76fcea1f045409a608320eaad5338d096196a92a8c1785
5fec78952f447ad721a39d19428132705b7fc4dfbda0d66a39b7ad439cbaae4e27041500080.0254072yes704150FUN FANCY CREW252SweaterGarment Upper body1010002Application/3D71Light Blue1Dusty Light2Blue7648Kids Boy Jersey FancyHChildren Sizes 92-1404Baby/Children46Kids Boy1005Jersey FancyLong-sleeved top in sweatshirt fabric with a motif on the front and ribbing around the neckline, cuffs and hem.NaNNaNPRE-CREATENONE53.08a8480eb9fd1930ada443e904d3c1bf4eedd35f1647dd29bea6fc523f3083353
6c392502a2b7758c9504a2cd5ec6ce432a38a3863594b96727519cecc603a28e96775110010.0084581yes677511Basic Kjell bracelet pk68BraceletAccessories1010016Solid5Gold5Bright15Metal4344JewelleryCLadies Accessories1Ladieswear66Womens Small accessories1019AccessoriesSlightly elasticated metal bracelets.NaNNaNACTIVENONE24.00fc3491c64d73d2507f376cac6cadd23a3d012c3b148ad9b406052e4b98373d0
79f842fd2d47f3330c54b25c8128285953dc0fa0dc9c9df196bb278f811524ef58636200070.0169322yes863620Archie254TopGarment Upper body1010017Stripe73Dark Blue4Dark2Blue1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicStraight-cut top in soft cotton jersey with a boat neck and long sleeves.NaNNaNACTIVENONE22.0fd6f02ef5d58f9a461a747461f01e65f7a13e2702e72d3abdc542ab333a5800e
8fca73f5109fa6309a4db5330d1b1d2009309fe1d51ae3b95b754969aa34c68bc7612210020.0423561yes761221FRIDA PILE HOOD308HoodieGarment Upper body1010016Solid11Off White1Dusty Light9White1660JerseyALadieswear1Ladieswear6Womens Casual1005Jersey FancyWide top in soft pile with a drawstring hood, kangaroo pocket, dropped shoulders and long sleeves.1.01.0ACTIVERegularly19.09873eb374893e0e8d4612ac48d077c17ee583c908c3743c4b893a59f689d4b8c
982b3e74a6dedb8e12d55abfb0c773ff64f74aba6cb8dc35a05ae96f25e25f91e6582980070.0213562yes658298Skirt Mini275SkirtGarment Lower body1010023Denim71Light Blue1Dusty Light2Blue1422SkirtALadieswear1Ladieswear15Womens Everyday Collection1012Skirts5-pocket skirt in washed denim with a high waist, button fly and frayed, raw-edge hem.1.01.0ACTIVERegularly29.03a47e8b067098201887b9d639fae33b39a5cbc376bcf238d6914162bb3d88678

Last rows

customer_idarticle_idpricesales_channel_idsaleproduct_codeprod_nameproduct_type_noproduct_type_nameproduct_group_namegraphical_appearance_nographical_appearance_namecolour_group_codecolour_group_nameperceived_colour_value_idperceived_colour_value_nameperceived_colour_master_idperceived_colour_master_namedepartment_nodepartment_nameindex_codeindex_nameindex_group_noindex_group_namesection_nosection_namegarment_group_nogarment_group_namedetail_descFNActiveclub_member_statusfashion_news_frequencyagepostal_code
12299903233f9c5309a304ed4aadad23a9095cb7910b3cee84960e9cdfe8ad3c215b86b8303650030.0338812yes830365Anais.258BlouseGarment Upper body1010001All over pattern9Black4Dark5Black1522BlouseALadieswear1Ladieswear15Womens Everyday Collection1010BlousesBlouse in woven fabric with a sweetheart neckline and long puff sleeves that are elasticated at the top and have a covered button at the cuffs. Gathered elastication at the front and down the sides and a smocked section at the back. Unlined.NaNNaNACTIVENONE22.01896679852dbefe6480542b1a4e52541bdf6869286d93757c2b5c76cb35f787a
122999110fd376f323e87a41973f10321d8286364bfe036c05fe0af403a47da11276be65544770260.0135421yes554477Victoria Pull- On TRS272TrousersGarment Lower body1010001All over pattern9Black4Dark5Black1747TrousersDDivided2Divided53Divided Collection1009TrousersAnkle-length trousers in an airy viscose weave with a regular, elasticated waist, side pockets and slightly wider, tapered legs.NaNNaNACTIVENONE29.06f337cea300ee5fbf8edcc0c68ba3dbba67a33be5c18c069576a0182c6c8ce67
1229992c18088b5b9a7d6f0c26b266fd2797ae9124a3050315411741d1ed5da59e580417563200200.0338811yes756320Lindsay Sl-set (W)297Pyjama setNightwear1010001All over pattern9Black4Dark5Black3709NightwearBLingeries/Tights1Ladieswear62Womens Nightwear, Socks & Tigh1017Under-, NightwearPyjama top and shorts in soft satin. V-neck cami top with adjustable spaghetti shoulder straps and lace at the top. Short shorts with narrow elastication at the waist and lace-trimmed hems.1.01.0ACTIVERegularly43.02c29ae653a9282cce4151bd87643c907644e09541abc28ae87dea0d1f6603b1c
12299933bce1f7f7b4050e16553ae6a24e3a685f8fe754725bcb73bb283673a6fcbb6977530610030.0084582yes753061Drake265DressGarment Full body1010017Stripe11Off White3Light9White1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicShort, sleeveless dress in soft cotton and modal jersey with a deep neckline and a narrow, elasticated seam at the waist.1.01.0ACTIVERegularly38.070f3c4ef6273aa575da3fdc5aed6e6441e14cf9717800d1a5d781dd88bffa6a0
1229994a78dfa6fe5f13c9e62dfe8b7c14b75fbabc9c41bbd39b59828a4bd80050a5dc88663830030.0203222yes866383Push it Push Bra.298Bikini topSwimwear1010026Other structure50Other Pink5Bright4Pink4242SwimwearBLingeries/Tights1Ladieswear60Womens Swimwear, beachwear1018SwimwearLined bikini top with padded cups for a larger bust and fuller cleavage. Wide shoulder straps and a metal fastener at the back.NaNNaNACTIVENONE24.0decd492b4b79d4648348cafcf8328556af70bdf8f1a0a0197bdd9d71fe561411
1229995b8f97d5de0b32a78f4c66349001852443047790b3d6b5a4c228d38f2b1f833a56644050020.0125761yes664405Virgo Hip belt67BeltAccessories1010016Solid9Black4Dark5Black3509BeltsCLadies Accessories1Ladieswear65Womens Big accessories1019AccessoriesBelt with a metal buckle. Width 2.5 cm.NaNNaNACTIVENONE54.0cba3b70e9265ee425109d7d8e26abe9438a7f9c74f94dcdddcf15904c7d6496f
1229996ad0c35bb8dae968d35a52c3ef2eac21fc09cceef33a0f3872918d7c88abcd8296858160020.0084582yes685816RONNY REG RN T-SHIRT255T-shirtGarment Upper body1010016Solid9Black4Dark5Black5832Light Basic JerseyFMenswear3Menswear26Men Underwear1002Jersey BasicRound-necked T-shirt in soft cotton jersey.NaNNaNACTIVENONE22.014b2909c3d1349c6cb31518cbe321863385799052b4b51b28ab0171ef410aed0
122999711ede6d0f206c2b22c6e0aaedcd7018843152cbbcb76996f98db97c5a73ecad97178740030.0423562yes717874Swift Padded Swimsuit57SwimsuitSwimwear1010017Stripe10White3Light9White4242SwimwearBLingeries/Tights1Ladieswear60Womens Swimwear, beachwear1018SwimwearFully lined swimsuit with a V-neck, narrow adjustable shoulder straps and cups with removable inserts that shape the bust and provide good support.1.01.0ACTIVERegularly21.0cc35d8ab7dea6c9d48f1d61c7129f1e626468761dfbdf6904c6afe113e4a7093
12299981fae3e0134069f937b6edf6a2fd974fd2f70bcda4921a622f06cd4ce605702d53514840270.0176102yes351484Lazer Razer Brief59Swimwear bottomSwimwear1010016Solid42Red5Bright18Red4242SwimwearBLingeries/Tights1Ladieswear60Womens Swimwear, beachwear1018SwimwearFully lined bikini bottoms with a mid waist, medium coverage at the back and laser-cut, scalloped edges.NaNNaNACTIVENONE22.0c16e64df118ea27c2d8d57d218832b849c532de79383e9453f032613b25abebe
122999945f48c39c7f5bbd3f9ee57cc6b424658861e730975cbcb736930c5b4ae8063cf6740100090.0423562yes674010SPEED Veronica dress w265DressGarment Full body1010016Solid42Red5Bright18Red1344DressesDDivided2Divided53Divided Collection1013Dresses LadiesShort, V-neck dress with a wrapover front, short flounced sleeves, a concealed fastening at the top and seam with a tie belt at the waist. Unlined.NaNNaNACTIVENONE28.026d9be9873c93e80209643b1eb382f5d10c378363b0a601d5cf755871daaf601

Duplicate rows

Most frequently occurring

customer_idarticle_idpricesales_channel_idsaleproduct_codeprod_nameproduct_type_noproduct_type_nameproduct_group_namegraphical_appearance_nographical_appearance_namecolour_group_codecolour_group_nameperceived_colour_value_idperceived_colour_value_nameperceived_colour_master_idperceived_colour_master_namedepartment_nodepartment_nameindex_codeindex_nameindex_group_noindex_group_namesection_nosection_namegarment_group_nogarment_group_namedetail_descFNActiveclub_member_statusfashion_news_frequencyagepostal_code# duplicates
5573d00063b94dcb1342869d4994844a2742b5d62927f36843164fb3f818f630bca96783420010.0067631yes678342Lima SS.255T-shirtGarment Upper body1010016Solid9Black4Dark5Black1643Basic 1DDivided2Divided51Divided Basics1002Jersey BasicFitted T-shirt in soft cotton jersey with a slightly wider neckline with a narrow ribbed trim.NaNNaNACTIVENONE27.0ecfb1e6aed8dde7c46c955c26185c51a1c21ca5ad6f819febb877c250613820426
399694665b46e194622ccdbcadc0170f13a2f8ede1ff6d057d43a19b8938c808b6626294200010.0084582yes629420claudine255T-shirtGarment Upper body1010016Solid10White3Light9White1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicT-shirt in soft cotton jersey.NaNNaNACTIVENONE23.045a5d77c5dc765f23b4ce8b38f30da63359061f8fb0b09e1cd5ac3c0398ade409
38468f5f1e993eff204ca7206cabe0fc6dfb75759994cacbf4c32c84ec5699a51c5d1896340010.0135422yes189634Long Leg Leggings273Leggings/TightsGarment Lower body1010016Solid9Black4Dark5Black1643Basic 1DDivided2Divided51Divided Basics1002Jersey BasicLeggings in stretch jersey with an elasticated waist.NaNNaNPRE-CREATENONENaN8a9ef31b5300ef3aaee31d794ac23c8294ac33042463da92e91b26274257c7f87
6358ef38ec0f0cb29ee8bbb87efc82fd16f4b99127e3eeefe69c9b5fce627e93e2705700020010.0121861yes570002ROY SLIM RN T-SHIRT255T-shirtGarment Upper body1010016Solid9Black4Dark5Black5832Light Basic JerseyFMenswear3Menswear26Men Underwear1002Jersey BasicRound-necked T-shirt in soft jersey.NaNNaNACTIVENONE24.02c29ae653a9282cce4151bd87643c907644e09541abc28ae87dea0d1f6603b1c7
136031db71ea558704fd429f0c9bb7f76475bd73577c9bf668d39260f31b9bfbce127281620010.0084582yes728162Talia (1)255T-shirtGarment Upper body1010016Solid9Black4Dark5Black1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicV-neck T-shirt in cotton jersey with a rounded hem. Slightly longer at the back.NaNNaNACTIVENONE55.00f1582b0c7c263c53a0ab88d134a80c9db6147455e1f94475c09ccc0bced2d4c6
38028de98d98789e2d90eb7d8b3b631e0a7d895aba860124a20e26c388998437b7578280470020.0135422yes828047Blossom tee255T-shirtGarment Upper body1010016Solid10White3Light9White1676Jersey BasicALadieswear1Ladieswear16Womens Everyday Basics1002Jersey BasicFitted, round-necked T-shirt in ribbed organic cotton jersey.NaNNaNACTIVENONE37.03f38900bac4f9881cb27273e8085e7b9ba1943d7a1d4c6e52965e40c23d6c9ad6
5527ce79a54991bb7c2c2d9427ae1e7f1d8c8b037f8d74b2fe659e87ad70e73ca6e75700040090.0169322yes570004PETER POLO257Polo shirtGarment Upper body1010016Solid9Black4Dark5Black5832Light Basic JerseyFMenswear3Menswear26Men Underwear1002Jersey BasicShort-sleeved polo shirt in soft jersey with a collar and button placket.1.01.0ACTIVERegularly32.040d74bff9dfc6f518a5e5ae8154c901606a2eb80a24b47c8121dd537cea03a706
12304dab48e5805e9c05272604ac78eb5eb941850ce307a7dd4bb5fe4652c0e49156955440010.0338812yes695544Pluto slacks RW272TrousersGarment Lower body1010016Solid9Black4Dark5Black1722TrouserALadieswear1Ladieswear15Womens Everyday Collection1009TrousersAnkle-length cigarette trousers in stretch satin made from a cotton blend with a zip fly, concealed hook-and-eye fasteners and a regular waist with concealed elastication. Side pockets, fake welt back pockets and tapered legs with creases.NaNNaNACTIVENONE56.02514309f2126697aa9611fa8ad638c89b6b6cbae3ce55ebcbf886bc70e00e2245
5381472c551f2c04873edddc853e214a033692b2b1d6ae2bb14f369d633202f19808525210010.0304922yes852521MALOU CREW252SweaterGarment Upper body1010008Front print11Off White1Dusty Light9White1660JerseyALadieswear1Ladieswear6Womens Casual1005Jersey FancyBoxy top in sweatshirt fabric with a motif on the front, low dropped shoulders and long sleeves with decorative seams. Ribbing around the neckline, cuffs and hem. Soft brushed inside.NaNNaNACTIVENONE48.045da989c2203268d20ff768b1c723c1f97fcbacd2daf67648ad3d89ae25bcadd5
265561da44a2758206d5701771f4315637b40c8321b511191654fb1430a6408e4dfa5079090010.0215931yes507909Rebecca or Delphine shirt259ShirtGarment Upper body1010016Solid10White3Light9White1515BlouseALadieswear1Ladieswear11Womens Tailoring1010BlousesGently tailored shirt in a stretch cotton blend with a turn-down collar, V-neck, buttons down the front and buttoned cuffs.1.01.0ACTIVERegularly29.0fce375cd69ffecf89cbc59ce8ee3b69436c86c469adc269b6305f7607e6006e65